Factoring 2048 RSA in 177 Days

Download as pdf or txt
Download as pdf or txt
You are on page 1of 18

Factoring 2 048 RSA integers in 177 days with 13 436 qubits and a multimode memory

Élie Gouzien∗ and Nicolas Sangouard†


Université Paris–Saclay, CEA, CNRS, Institut de physique théorique, 91 191 Gif-sur-Yvette, France
(Dated: March 11, 2021)
We analyze the performance of a quantum computer architecture combining a small processor
and a storage unit. By focusing on integer factorization, we show a reduction by several orders of
magnitude of the number of processing qubits compared to a standard architecture using a planar
grid of qubits with nearest-neighbor connectivity. This is achieved by taking benefit of a temporally
and spatially multiplexed memory to store the qubit states between processing steps. Concretely, for
a characteristic physical gate error rate of 10−3 , a processor cycle time of 1 microsecond, factoring
a 2 048 bits RSA integer is shown possible in 177 days with a processor made with 13 436 physical
qubits and a multimode memory with 2 hours storage time. By inserting additional error-correction
steps, storage times of 1 second are shown to be sufficient at the cost of increasing the runtime
by about 23 %. Shorter runtimes (and storage times) are achievable by increasing the number
of qubits in the processing unit. We suggest realizing such an architecture using a microwave
interface between a processor made with superconducting qubits and a multiplexed memory using
the principle of photon echo in solids doped with rare-earth ions.
arXiv:2103.06159v1 [quant-ph] 10 Mar 2021

Introduction — Superconducting qubits form the 1 logical qubit

building blocks of one of the most advanced platforms

temporal modes
for realizing quantum computers [1]. The standard ar-
chitecture consists in laying superconducting qubits in a
2D grid and making the computation using only neigh-
boring interactions. Recent estimations showed however
that fault-tolerant realizations of various quantum algo-

es
od
rithms with this architecture would require millions phys-

lm
ia
ical qubits [2–4]. These performance analyses naturally

at
sp
raise the question of an architecture better exploiting the
potential of superconducting qubits.

In developing a quantum computer architecture we


Processor
have much to learn from classical computer architectures.
Realizations using trapped ions for example combine pro-
Figure 1. Quantum computer architecture using a processor
cessing with storage units [5]. The authors of Ref. [6] re-
made with a 2D grid of qubits and a memory operating as a
alized that key quantum algorithms are mostly sequential qubit register where the address of each qubit is specified by
meaning that we may only need a small computing block a temporal and spatial index. Only (dressed) logical qubits
for all the qubits in the storage unit in this architecture. are represented; additional ancillary qubits are used for mea-
Ongoing experimental efforts aim at exploiting this idea suring the operators for error correction.
to reduce the number of superconducting qubits in the
standard approach to quantum computing by adding a
quantum memory implemented with spins or atoms [7– More precisely, we use 3D error correction codes [10]
9]. A detailed analysis of the performance of this hybrid in which the address of each (dressed) logical qubit is en-
architecture is however missing. coded into a 3D structure of physical addresses, two di-
mensions being encoded in space and one in time, see Fig-
ure 1. Error correction and logical gates are applied by
We here report on such an analysis by considering a sequentially releasing physical qubits corresponding to
quantum memory that can store multiple spatial and different “horizontal” slices (with different temporal in-
temporal modes. The memory can be thought as a qubit dexes) and by processing each slice (with the same tem-
register in which the address of each qubit is identified poral indexes) simultaneously.
by a temporal and a spatial index. When a given qubit
needs to be processed, its state is released and mapped We assess the performance of this architecture through
into the processor by means of a microwave field in a a version of Shor’s algorithm [11] proposed by Ekerå and
temporal and spatial mode corresponding to the qubit Håstad [12]. The algorithm is a threat for widely used
address. When the processing is done, the qubit state crypto-systems based either on the factorization [13] or
is mapped back to the memory and stored until another the discrete logarithm problem [14, 15]. It can also be
processing operation is needed. considered as a certification tool to check the proper func-
2

tioning of an actual quantum computer as its outcome mod N ) satisfies r > (p + q − 2)/2, computing the dis-
can be verified efficiently. Last but not least, the cost crete logarithm of h modulo N , as detailed later, yields
of its implementation has been evaluated using plausible d = (p + q − 2)/2. For large N , the assumption is ver-
physical assumptions for a large scale processor with a ified with a high probability [12]. Using N = pq and
standard 2D grid of superconducting qubits (a charac- d = (p + q − 2)/2, where N and d are both known, p
teristic physical gate error rate of 10−3 , a surface code and q can be recovered by choosing one of the solutions
cycle time of 1 µs, and a reaction time of 10 µs): it was of the second order equation N = p(2d + 2 − p), and then
estimated that it should be possible to factor a 2 048 bits exploiting q = 2d + 2 − p.
RSA integer in 8 hours with 20 millions qubits [2].
The discrete logarithm is computed in Ekerå and
By taking this cost estimation as a reference, we es- Håstad’s algorithm in three steps. First, the exponenti-
timate the approximate cost of implementing the same ation (e1 , e2 ) → g e1 h−e2 is applied once on two quantum
version of Shor’s algorithm in terms of physical process- registers prepared in a superposition of every possible
ing qubit number, multimode capacity, memory storage values of e1 and e2 respectively. Two quantum Fourier
time, and runtime. Our cost evaluation is given in the transforms are then applied independently to the two reg-
case where the processor is made with two (dressed) log- isters before being measured. Finally, a classical post-
ical qubit slices. Under the assumptions used in Ref. [2] processing extracts the discrete logarithm d of h modulo
for the gate error rate and the cycle time, we show that N from the measurement results. Because the measure-
it should be possible to factor 2 048 bits RSA integer in ments are performed directly after the Fourier transform,
177 days using a multimode memory with a storage time the cost of the exponentiation largely dominates the cost
of about 2 hours and a processor including 13 436 physical of Ekerå and Håstad’s algorithm, see Appendix A.
qubits — a reduction by more than 3 orders of magni-
tude of the number of physical qubit as compared to the Number of gates — The modular exponentiation
standard architecture without memory [2]. By insert- needed in Ekerå and Håstad’s algorithm, i.e. the opera-
ing additional error-correction steps, we show that the tion |ei |1i 7→ |ei |g e mod N i, with the input e and the
storage time can be significantly reduced at the cost of output g e mod N encoded on ne and n bits respectively,
a slight increase of the runtime. We also explain how can be decomposed into ne multiplications, each being
shorter runtimes and storage times are achievable at the decomposed into 2n controlled additions of integers of
cost of increasing the number of qubits in the processing typical size n and 1 controlled swap between two regis-
unit. We propose a realization of such an architecture ters of size n, giving a total number of 2ne n (ne ) con-
using a microwave interface between a processor made trolled additions (swaps between registers respectively),
with superconducting qubits and a multiplexed memory see Appendix B for a detailed explanation. Each addi-
using the principle of photon echo in solids doped with tion is a modular addition, which can be obtained with a
rare-earth ions embedded in cavities. standard adder circuit at the cost of a specific represen-
tation — the coset representation (see Appendix C) —
Principles of (a variant of ) Shor’s algorithm — We adding m additional qubits to the register. This means
consider the factorization of a number N = p × q con- that 2ne (n + m) controlled additions of integers of size
structed as the product of two prime numbers of similar n + m and ne controlled register swaps between regis-
sizes, p and q. We note n the number of bits involved ters of size n + m are needed for realizing Ekerå and
in the binary representation of N , that is 2n−1 ≤ N < Håstad’s algorithm. A controlled swap operation be-
2n . While no efficient classical factorization algorithm is tween two qubits can be performed using two CNOTs
known, Shor’s algorithm and its variants factor N with and one Toffoli gates. Hence, the total cost for controlled
a polynomial complexity into n [11, 12, 16–19]. swaps operating on two registers using n + m qubits is of
2(n+m) CNOTs and n+m Toffoli gates, see Appendix B.
The version of Shor’s factorization algorithm proposed For the controlled addition, we can use a semi-classical
by Ekerå and Håstad [12] starts by first selecting ran- adder whose mean cost for integers of size n + m is of
domly an integer g such that 1 < g < N and then 5.5(n + m) − 9 CNOTs and 2(n + m) − 1 Toffoli gates, see
checking that g falls in the multiplicative group of in- Appendix B. Given the number of gates in the controlled
tegers modulo N (g ∈ Z∗N ) using Euclid’s algorithm. If addition and swap operations, the number of additions
not, g is a prime factor of N . Once g is shown to be and swaps in the multiplication and the number of multi-
in Z∗N , the number h = g (N −1)/2 is computed. Given plications in the modular exponentiation, the cost of fac-
that the order of Z∗N is φ(N ) = (p − 1)(q − 1), we torization can easily be estimated, see Appendix B. This
have h = g (pq−p−q+1)/2 g (p+q−2)/2 ≡ g (p+q−2)/2 mod N cost can however be reduced using windowed arithmetic
where the last equivalence is the result of Chinese re- circuits [20]. The basic idea consists in grouping the bits
mainder theorem. Under the assumption that the order of e by blocks (in blocks including we bits) for control-
r of g (the smallest non-negative integer such that g r ≡ 1 ling each multiplication, hence reducing the number of
3

these multiplications. Similarly, for each multiplication, error probability on one logical qubit given in [25, eq. (4)]
the input bits are grouped (in blocks including wm bits)    
to reduce the number of additions composing each mul- p
plogical = A exp α log dβ (1)
tiplication. As detailed in Appendix D, the cost of expo- pth
nentiation is dominated in this case by 2 new(n+m)n
e wm
one
where A ≈ 0.033, α ≈ 0.516, β ≈ 0.822, p is the error
qubit gates, (2we +wm n + 12(n + m)) new(n+m) CNOTs,
2
e wm probability per physical qubit, d the code distance which
and 4 new(n+m)
e wm
Toffoli gates. We emphasize that this is a is related to the number of physical qubits per logical
first order estimation. In the code used to compute the qubits (see below) and pth the fault-tolerance threshold.
required resources and find the optimal parameters, the We take pth = 0.75 %, a value reported in Ref. [26] us-
complete formulae have been used [21]. ing a projection decoder. Note that estimations suggest
that the threshold could be higher, with a more efficient
Error correction — The error correction is achieved decoder [27].
using 3D gauge color codes, a family of subsystem codes
presented in Ref. [10]. A first code admits a transversal Architecture — For simplicity, the tetrahedral struc-
implementation of CNOT and Hadamard gates while a ture of the error correction (see Appendix E) can be in-
second code accepts a transversal implementation of the cluded into a large cube in which the physical qubits
non-Clifford T gate. Switching between the two codes are now represented by elementary cubes, see Figure 1.
gives a universal set of gates without the need for state The large cubes are stored into the memory and loaded
distillation [22], contrary to standard ways of operating by slices into the processor when they need to be pro-
the surface code [23]. cessed. We size the processor such that one slice of two
large cubes can be loaded simultaneously, which is con-
venient to perform two qubits gates efficiently. Each gate
The two codes are based on a shared geometrical struc- is immediately followed by an error correction round on
ture: a large tetrahedron constructed from elementary the processed qubits. This is done by reloading again
tetrahedrons (see Appendix E for details). A physical each slice sequentially in the processor and by measur-
qubit is attributed to each elementary tetrahedron. As ing the gauge generators (before recovering classically
in any subsystem codes, the stabilized subspace is split the code stabilizers), each of them using up to 6 two-
into a tensorial product of the (bare) logical and gauge qubit gates, one auxiliary qubit and one measurement
qubits (the dressed logical qubits include the bare logical of this auxiliary [22, 28]. Note that the codes of inter-
qubits and the gauge qubits). A set of operators — gen- est are 3D-local and the auxiliary qubits only needs to
erators of gauge operators — are measured, each being keep coherence for the time of loading and measuring
the product of (up to six) X (or Z) operators associated two successive slices for performing successfully a sta-
to qubits corresponding to the tetrahedrons sharing the bilizer measurement. Once the syndromes are obtained
same edge. From these measurements, the values of sta- and the errors are detected, the correction of these errors
bilizers of the two codes are deduced. In the code used is delayed and merged with the next operation applied
for implementing the H and CNOT gates, the stabiliz- on the qubit to be corrected. Further note that all-to-
ers are defined from the vertices, i.e. the product of X all connectivity between the logical qubits is achieved if
(or Z) operators associated to qubits corresponding to each physical address in the memory can be mapped to
the tetrahedrons sharing the same vertex. In the code three physical qubits in the processor: two for the imple-
used for implementing the T gates, the stabilizers are de- mentation of two-qubit gates (depending on whether the
fined from the vertices for the X operators and from the physical qubit is involved in the logical controlled or tar-
edges for the Z operators. The value of an operator rep- get qubits) and one for performing the error correction.
resented by a vertex is classically recovered by multiply- For achieving a code distance d, corresponding to a cube
ing the measurement results of combinations of specific height d, the number of physical qubits in the processor
edges ending at the given vertex. Several combinations 2
is nqubits = 2×2× 3d +2d−3
2 , corresponding to two logical
are possible giving redundancies that can be exploited to
qubit slices (see Appendix E) and including the ancillary
achieve fault-tolerant error correction with only one run
qubits (essentially one per physical qubit) needed for the
of measurements [24]. The structure of codes in which the
stabilizer measurements. For a code distance d, we ap-
stabilized subsystem is the tensor product of the gauge
proximate the time it takes to perform one (single qubit
and (bare) logical subsystems guarantees that the mea-
or two qubit) logical gate by 2(d − 2)tc where tc is the
surements of gauge operators don’t reveal the value of
cycle time of the 2D processor (time to load one qubit
the (bare) logical qubit (see Appendix E).
slice, to measure the stabilizers, which is longer than the
gate operation, and to reload the slice into the memory)
To account for the additional resource needed to im- and the factor 2 comes from the fact that the gate is
plement these codes, we use an estimation of the residual immediately followed by an error correction round.
4

14 and storage time can be reduced by increasing the size of


13
12
11
qubits the processor (see Appendix F). We also estimated that
month
10 28 millions spatial modes and 45 temporal modes need
qubit number (kiloqubit)

time
9
week to be stored. Note that the number of stored modes does
8

Expected time
7 not enter in the volume and is thus not optimized. Note
6 day
also that the qubit address in the memory can be iden-
5
tified by temporal indexes only at the cost of a longer
4 hour runtime when photon-echo type protocols are used, cf.
below for a concrete example.
3

Implementation — Our proposal provides a viable


2 minute
32 64 128 256 512 1024 2048
solution to get rid of the individual control of millions
n qubits but the challenge now relies on the realization
of an efficient multimode quantum memory. As shown
in Ref. [29], such a memory could be implemented us-
Figure 2. Number of qubits in the processor and runtime
ing a solid-state spin ensemble (N̄ spins with an in-
to factor of n-bit RSA integers with a computer architecture
using a multimode memory. homogeneous spectral broadening Γ), resonantly cou-
pled (with single spin coupling rate g) to a frequency
tunable single-mode microwave resonator (with damp-
Cost evaluation — To evaluate the resources required ing rate κ to an external transmission line). The res-
for integers factorization, we consider the total number of onator serves to enhance microwave absorption and re-
gates involved in the logical circuit. The total runtime for emission by the spins. In particular, unit efficiency ab-
one attempt is obtained by multiplying the gate number sorption of a microwave field can be realized if the fi-
by the time it takes to perform one gate, while the success nesse F of the resonator matches the single-path absorp-
−1
probability is deduced from the logical error probability tion αL of spins F = (αL) , i.e. if the cooperativity
2
(Equation (1)). Following Ref. [2], we consider a cycle C = gκγN̄ = (αL) × F = 1 [30]. Once absorbed, the mi-
time of tc = 1 µs and a mean error per physical qubit crowave field can be re-emitted by time-reversing the in-
and per gate of p = 10−3 . Note that the mean error homogeneous dephasing using a spin echo technique [31].
per gate now includes errors during the reading of and Detuning the resonator off and on resonance at the right
writing into the memory. time, the spin coherence is recovered, leading to a noise-
free, unit re-emission probability √
of the stored photon if
The cost evaluation is finally obtained by optimizing C = 1 [29]. In the regime κ  g N̄  Γ, the memory
the two window parameters wm and we , the coset rep- bandwidth is given by 4Γ [29], meaning that any input
resentation padding m and the code distance d in order with a spectrum, say, ten times thinner i.e. 4Γ/10 can be
to minimize the volume texp × nqubits . texp = pts is the stored with a close to unit efficiency. Furthermore, the
average time to obtain the result (several attempts might time duration during which an optical coherence can be
be necessary), with t the computation time per attempt preserved is limited by the inverse of the homogeneous
and ps the success probability. linewidth γh [29]. Assuming that the storage efficiency
is unchanged if the storage time is hundred times shorter
Results — The required resources to factor a n-bit than γh−1 , this means that the number of temporal modes
RSA integer are presented in Figure 2 and discussed in that can be stored with almost unit efficiencies is roughly
Appendix F. Our estimation suggests that the factoriza- given by Γ/(250γh ). Interestingly, a well-identified tem-
tion of a 2 048-bit integer corresponding to the most com- poral mode can be released while keeping all the other
mon RSA key size would be possible in about 177 days modes in the memory by detuning appropriately the res-
with a processor having only 13 436 bits. Concerning the onator off and on resonance with the spins at the cost of
memory, we estimated the maximum time between the introducing a deadtime between two readouts of half the
storage and readout of the same qubit to be less than duration of the stored train of pulses in average.
two hours. A memory with a storage time of at least
two hours is however not necessary as error-correction To give an idea of what could be realized in a near
steps can be implemented periodically at the cost of in- feature, we estimate that it should be possible to factor
creasing the runtime. Error correction of all the qubits 35 in about 1 min using the exact algorithm presented
stored in the memory is estimated to take 186 ms with here (with windowed arithmetic and 3D color codes) and
a processor having 13 436 bits, meaning that the storage a setup combining a memory for storing 38 logical qubits
time simply needs to be longer than 186 ms. Applying a (3 002 spatial modes and 5 temporal modes) and a pro-
correction every second for example would increase the cessor with 316 physical qubits (we estimate that more
runtime by about 23 %. Note also that both the runtime than 60 000 qubits would be needed with a standard 2D
5

grid and surface code). If instead of using a spatially and right after the Hadamard provided that the following
temporally multiplexed memory, the qubits are stored in phase gates are classically controlled by the result of this
the same spatial mode and are identified by (6 650) tem- measurement, see Figure 3c. In this case, the successive
poral addresses only, we evaluate the same factorization classically controlled phase gates operating on the same
to be possible in about 1 day using a memory bandwidth qubit can be merged together, leading to a circuit with
4Γ = 2π × 48 MHz and taking into account the corre- one phase gate, one Hadamard gate and one measure-
sponding deadtime between two memory readouts. In ment per qubit. When this semi-classical Fourier trans-
this case, error correction of all the qubits stored in the form operates on a register made with ne qubits (the
memory is estimated to take 132 ms meaning the stor- number of bits of the exponent), its cost is linear in ne
age time needs to be longer than 132 ms. For a memory and is thus negligible compared to the cubic complexity
bandwidth 4Γ = 2π × 120 MHz, the same factorization of the exponentiation.
would take 9 hours and error correction is estimated to
take 53 ms. As discussed in Appendix G, these require-
ments can realistically be met with a realization of the
quantum memory protocol described before combining a Appendix B: Decomposition of the exponentiation
solid doped with rare-earth and a superconducting mi- into elementary gates
crowave resonator [32–34].

Conclusion — We have shown that the use of a quan- In this appendix, we aim to give a clear view of how to
tum memory for quantum computing is appealing as un- decompose the modular exponentiation into elementary
processed qubits can be loaded into the memory which gates. The presented method is intended to be simple
significantly reduces the size of the processor. All-to-all to understand, but not optimal. A more efficient one is
connectivity between physical qubits can be obtained if presented in Appendix D.
each address in the memory can be mapped to arbitrary
qubits in the processor, hence offering many opportuni-
ties for error correction and for implementing algorithms
with gates operating between non-neighbouring qubits. 1. Decomposition of a modular exponentiation into
The effective improvement of the coherence time of un- additions
processed qubits is however not crucial, because error
correction is exponentially efficient, that is, a small in-
crease of the code distance can compensate a significant The modular exponentiation needed in Ekerå and
reduction of coherence time. Håstad’s algorithm, i.e. the operation |ei |1i 7→
|ei |g e mod N i, with the input e and the output g e
We acknowledge M. Afzelius, J.-D. Bancal, E. Flurin, mod N encoded on ne and n bits respectively, can be
P. Bertet, P. Sekatski, X. Valcarce and J. Zivy for stim- implemented from controlled modular additions as we
ulating discussions and/or for critically reviewing the show now. For simplicity, we omit the modulo in this
manuscript. We acknowledge funding by the Institut de paragraph.
Physique Théorique (IPhT), Commissariat à l’Énergie
Atomique et aux Energies Alternatives (CEA) and the Let ene −1 . . . ei . . . e0 be the binary form of e. The ex-
Region Île-de-France in the framework of DIM SIRTEQ. ponentiation can first be seen as a sequence of multipli-
cations
e −1
nY e −1 h
nY iei
i i
Appendix A: Semi-classical Fourier transform ge = g2 ei
= g2 (B1)
i=1 i=1

We here discuss the semi-classical Fourier transform where each multiplication is controlled by the bit value
presented in [35] and show that its cost is negligible. ei . Figure 4 shows an implementation of such a mul-
The standard way to perform the Fourier transform on tiplication in which a quantum register encoding the
i
ne qubits is shown in Figure 3a: it requires a sequence of integer x ends up into an encoding of x × g 2 ei . It
one Hadamard and controlled phase gates for each qubit. uses two controlled product-additions, i.e. the opera-
In Shor’s algorithm as well as in Ekerå and Håstad’s ver- tion letting |yi |zi unchanged if |ei i = |0i and map-
i
sion of Shor’s algorithm, the qubits are measured right ping |yi |zi into |yi |z + y × γi ((y, z, γ) → (x, 0, g 2 ) for
after the Fourier transform, hence explaining the mea- the first product-addition appearing in Figure 4 and
i
surements of each qubit at the end of gate sequences in (y, z, γ) → (x̄, x, −g −2 ) for the second one, where the
Figure 3a. The simple rearrangement presented in Fig- negative power stands for multiplicative inverse modulo
ure 3b shows that the measurement can be performed N ) when |ei i = |1i. In case |ei i = |1i, the mapping
6

(a) Standard quantum Fourier transform.

H • ··· • ···

φ2 · · · H • ··· •
··· φ2 · · · H •

· · · φne · · · φne −1 φ2 H

(b) Rearranged Fourier transform.

H • • ··· • ···

φ2 H • ··· • ···

φ3 φ2 H ··· ··· •

··· φ ne φne −1 · · · φ2 H

(c) Semi-classical Fourier transform. The successive classically controlled rotation gates can
be merged.

H • • ··· • ··· +3

φ2 H • ··· • ··· +3

φ3 φ2 H ··· ··· • +3

··· φ ne φne −1 · · · φ2 H

Figure 3. Different versions of the Fourier transform followed by measurements. They are used to convince the reader that the
number of gates in the Fourier transform is negligible with respect to the cost 
of the exponentiation. These three versions are
2πi

based on the phase gates φk defined as a 2 × 2 matrix with diagonal elements 1, e 2k and zeros off diagonal. Note that the
control and target qubits can be reversed in the representation of each controlled phase gate without changing the result.

|ei i • |e
ii i E
2i 2 ei them controlled both by the values of bits |yj i and |ei i.
|xi / ×g Figure 5 shows explicitly the decomposition of the first
xg
= product-addition appearing into each multiplication of
|ei i • • • |e
ii i E
i 2 ei the exponentiation.
|xi / Input x +x̄(−g −2 ) × xg
i
|0i / +xg 2 Input x̄ × |0i
|ei i • |ei i • ··· • ···
Figure 4. Principle of a modular multiplication circuit trans- |x0 i • ··· ···
..


forming a quantum register encoding the integer x into a state 
··· ···
i |xi Input x = .
encoding x × g 2 ei . A first product-addition |xj i ··· • ···
E operation trans-

..

··· ···
i

forms auxiliary qubits in |0i into x × g 2 if |ei i = |1i. Then, .

i i
i i / +xg 2 / +20 g 2 · · · +2j g 2i · · ·
a product-addition applies +x̄(−g −2 ) with x̄ = x × g 2 into
the register encoding x if |ei i = |1i. A final swapping is ap-
Figure 5. Decomposition of the first product-addition appear-
i
E
plied if |ei i = |1i to put the quantum register into x × g 2 ei

ing in each element of the decomposition of the exponentiation
and resets the auxiliary qubits to |0i. Note that all the oper- into multiplications, see Figure 4.
ations are performed modulo N .

is performed by considering the binary representation We deduce that the modular exponentiation requires
yn−1 . . . y0 of y and by rewriting the product as ne multiplications, each being decomposed into 2n con-
n−1 n−1 trolled additions and 1 controlled swap between two reg-
isters, giving to a total number of 2ne n (ne ) controlled
X X
γ2j yj =
 j
y×γ = γ2 yj . (B2)
j=0 j=0 additions (swaps between registers respectively). Each
addition needs to be modular, which can be obtained
As yj is either 0 or 1, the controlled product-addition with a specific representation and a standard adder cir-
can be implemented by a sequence of additions, each of cuit.
7

2. Coset representation (a) Doubly controlled semi-classical addition


|ei i • •
|xj i • •
|0i • • • • • • • • • |0i
The basic idea of the coset representation for adding (2j γ)0 • • •
2j γ to a quantum register encoding the integer z is to |z00 i • • |s0 i
extend the register for z with m additional qubits and to |0i • • • • • |0i
2m −1 (2j γ)1 • • • • •
encode it into the state √21m
P
|z + kN i. Except at |z10 i • • |s1 i
k=0
the bounds, this state is invariant under the addition of |0i |0i
N . This implies: •
(2j γ)2 •
|z20 i |s2 i
m
2X −1
1 z + 2j γ + kN mod 2n+m

√ (b) Controlled swap
2m k=0
• •
m
× = •
2X −1
1 × • •
z + 2j γ

≈√ mod N + kN
2m k=0 Figure 6. Controlled operations. (a): semi-classical adder
taking each bit of the classical value 2j γ = 2k=0 2k (2j γ)k and
P
i.e. the modular addition of 2j γ in the register of z can the three qubits register encoding z 0 as inputs and returning
(2j γ)k and |sk i = (z 0 + 2j γ · ei · xj )k . The block in the

be performed with a standard adder, at the cost of a
small error which is exponentially suppressed when in- dashed box uses in average 5.5 CNOTs and 2 Toffoli gates.
creasing m [36]. Note that the resource needed to initial- (b): Fredkin gate implemented with a Toffoli and two CNOT
gates. The controlled swap between registers (as required in
ize the register is negligible with respect to the resource Figure 4) is obtained by applying it to each pair of qubits.
taken to implement the adder, see Appendix C. Taking
into account the increase in register size, this means that
2ne (n + m) controlled additions and ne controlled reg- 4. Number of gates
ister swaps are needed for realizing Ekerå and Håstad’s
algorithm.

Given the number of gates in the controlled addition


and swap operations, the number of additions and swaps
3. Controlled operations in the multiplication and the number of multiplications
in the modular exponentiation, we estimate that factor-
2
ization takes at leading order 11ne (n + m) CNOTs and
2
A controlled swap operation between two qubits (Fred- 4ne (n + m) Toffoli gates.
kin gate) can be performed using two CNOTs and one
Toffoli gates, see Figure 6b. Hence the total cost for
controlled swaps operating on two registers prepared in
the coset representation of integers (encoded each with Appendix C: Coset representation
n + m qubits) is of 2(n + m) CNOTs and n + m Toffoli
gates.

For the controlled addition, note first that since we Modular addition is typically implemented with vari-
use the coset representation of integers, a circuit for con- ants of the addition: an addition, a comparison,
trolled addition modulo a power of two is sufficient to im- a controlled correction and the clean-up of ancillary
plement a controlled modular addition. Such an addition qubits [39]. As exposed in main text, coset representa-
can be implemented with the semi-classical adder pre- tion of integers, introduced by Zalka [40] and formalized
sented in Figure 6a, which is inspired by Refs. [3, 37, 38]. by Gidney [36], can be used to approximate the modular
It shows the basic circuit taking a classical value 2j γ and addition with a single standard adder circuit.
a register encoding z 0 and returning 2j γ and z 0 +2j γ if the
two controlled qubits |ei i and |xj i are both in state |1i. The basic idea of the coset representation for adding
When such an addition is applied on a quantum register 2j γ modulo N to a quantum register encoding the in-
encoding z 0 using n + m qubits, the block in the dashed teger z is to extend the register for z with m additional
box of Figure 6a is repeated n + m − 2 times, giving a 2m −1
qubits and to encode it into the state √21m
P
|z + kN i.
mean cost of 5.5(n+m)−9 CNOTs and 2(n+m)−1 Tof- k=0
foli gates. Except at the bounds, this state is invariant under the
8

|0i H • H |0i H • H |0i H • H


n
|zi / ...
m +N −1 if x ≥ N +2N −1 if x ≥ 2N +2m−1 N −1 if x ≥ 2m−1 N
|0i / ...

2m
P−1
Figure 7. Preparation proposed in [36, Fig. 1] of a quantum register with n + m qubits in the state √1 |z + kN i as
2m
k=0
requested in the initialization of the coset representation. The first controlled operation adds the integer N to the register
made with n+m qubits provided that the ancillary qubit is in state |1i. The first classically controlled operation aims to change
the phase of the input state encoded in n + m qubits if and only if the result of the measurement is 1 and the number encoded
in the n + m qubits is larger or equal than N . In case one of the two conditions is not met, the input state is unchanged.

addition of N . This implies prepared in the state |0i+|1i



2
(m ancillary qubits in total)
which is then uncomputed, see Figure 7. The controlled
m
2 −1 addition is performed using the circuit presented in Fig-
1 X
z + 2j γ + kN mod 2n+m

√ ure 6a with only one control qubit. The uncomputation
m
2 k=0
of the ancillary qubit is based on a measurement and
m
2 −1 depending on the result, a conditioned correction is re-
1 X
z + 2j γ

≈√ mod N + kN (C1) alized, see Figure 7. Let us detail the uncomputation of
2m k=0
the first ancillary qubit presented in Figure 7. When the
result of the measurement is 0, the register made with
i.e. the modular addition of 2j γ in the register of z can be
n + m qubits is projected into √12 (|zi + |z + N i). When
performed with a standard adder (modulo 2n+m ), at the
cost of a small error which is exponentially suppressed the result is 1, the register state is √12 (|zi − |z + N i)
when increasing m [36]. Note also that the precision is and the operation −1 needs to be applied to the compo-
improved if instead of adding 2j γ, one adds 2j γ mod N nent |z + N i, i.e. when the state of the register encodes
(which does not change the result of the sum since we an integer larger than N . In order to implement the
consider the sum modulo N ). This is possible each time conditioned operations for decomputing the m ancillary
the quantity to add is known classically. qubits, we need to compare the value x encoded in the
quantum register of size n + m and an integer y known
The goal of the first subsection is to show that the classically satisfying 0 < y ≤ 2m−1 N < 2n+m (see the
resource needed to extend the register encoding |zi into last umcomputation in Figure 7) i.e. that can be writ-
2m −1 ten with n + m bits. This comparison is implemented
the state √21m
P
|z + kN i, as requested in this repre- using the circuit presented in Figure 8a. First, the value
k=0
sentation, is negligible with respect to the resource taken 2n+m − y = y 0 is computed classically. Then the last
to implement the modular exponentiation. In the second carry of the sum of x and y 0 is computed with a circuit
subsection, we show that the coset representation is com- derived from the addition. If the value of this carry is 1,
patible with the modular multiplication circuit presented we conclude that x ≥ y, otherwise x < y. A Z gate is
in the main text. thus applied on the qubit encoding the last carry, before
uncomputing the carries. The register ends up in state
In the two next subsections, the coset representation ± |xi depending on the relative value between x and y,
is considered for additions modulo N ; n is the number of as desired.
bits encoding N , and m the number of qubits added to
the register for the coset representation. Each controlled addition and correction costs
O(n + m) gates. This operation is repeated m times,
giving a total cost of the coset representation initializa-
tion of the order O(m(n + m)).
1. Initialization
In the modular exponentiation algorithm, the two
registers at the bottom of Figure 4 need to be pre-
Starting from a register with n qubits in state |zi, pared initially in x = 1 and 0 respectively. Initializing
2m −1
the initialization of the coset representation consists in them in the coset representation √21m
P
|1 + kN i and
2m −1
k=0
preparing the state √21m
P
|z + kN i in an extended 2m −1
√1
k=0
P
|kN i takes O(m(n + m)) gates which is negligi-
register of size n + m. This is done by performing suc- 2m
k=0
cessive additions, each controlled by an ancillary qubit ble compared to the cubic cost of the full exponentiation.
9

(a) Semi-classical comparison and correction 2m −1 2m −1 E


2i
1
xg + k 0 N thanks
P P
x0 • • x0 very close to 2m |x + kN i
y00 • • y00 k=0 k0 =0
• • • • • • to the coset representation itself, cf. Equation (C1). The
x1 • • x1 latter corresponds to the desired state.
y10 • • • • y10

• • • • • • Appendix D: Windowed arithmetic


x2 • • x2
y20 • • • • y20
Z In order to reduce the number of multiplications and
additions in the exponentiation algorithm, we use win-
(b) AND computation (c) AND uncomputation dowed arithmetic circuits [20]. They consist in grouping
• • • •
• = • • = • the bits of e for controlling each multiplication, hence re-
|0i |0i ducing the number of multiplications. Similarly, for each
multiplication, the input bits are grouped to reduce the
Figure 8. (a): Circuit inspired from [3, Fig. 17] which com- number of additions composing each multiplication.
pares the integer x encoded in n + m qubits and the integer
y < 2n+m known classically, and returns − |xi if and only if
The next subsection shows the details of the decompo-
x ≥ y. This is done in three steps: i) compute the carries of
y 0 + x with y 0 = 2n+m − y, ii) apply a Z operation on the last sition of the exponentiation into elementary additions of
carry and iii) uncompute the carries. the form +Tk mod N where the quantity Tk depends on
(b) and (c): circuits defining the notations used to compute the value of an integer k. These specific additions are im-
and uncompute an AND operation, as introduced in [38, 41] plemented in three steps, that are presented in separated
where the authors give efficient implementations in terms of subsequent subsections.
T (or π4 ) gates. When only one quantum control appear, it
uses a CNOT instead of a Toffoli gate, and it can be removed
by directly using the control bit instead of the ancillary.
1. Windowed exponentiation and multiplication

Note however, that the cost of this initialization is taken


into account in our script for the evaluation of the whole Let us start by specifying the notations. We label the
algorithm cost. binary form of e as
ei:i+w
z }| e {
ene −1 . . . ei+we ei+we −1 . . . ei . . . e2 e1 e0 (D1)
2. Compatibility with the multiplication i.e. ej is the jth bit of e. Let also ei:i+we be defined as
i+w
X e −1

ei:i+we = 2j−i ej (D2)


When computing the multiplications from sequences j=i
of two product-additions (see Figure 4 of main text), the
input register encoding x and the ancillary register are i.e. ei:i+we is the number whose bit decomposition is
used both as control and target of the product-additions. given by the bits of e starting at index i and taking
We here check that having the control register encoded in we bits. The strategy for computing the exponentiation
the coset representation is not a problem for performing using windowed arithmetic consists in decomposing ex-
the multiplication. ponent e in terms of numbers ei:i+we
X
e= 2i ei:i+we , (D3)
Let us consider the first product-addition used
0≤i<ne
to implement the multiplication shown in the bot- i≡0 mod we
tom part of Figure 4. In the coset representation,
such that
the input x and ancillary registers are in the state
2m −1 2m −1 i
Y
1
P
|x + kN i
P
|0 + k 0 N i meaning that af- ge = g2 ei:i+we
. (D4)
2m
k=0 k0 =0 0≤i<ne
ter the product-addiction, their state ends up in i≡0 mod we
2m −1 2m −1 E
1
m
P P
|x + kN i

+ kN )g 2i
+ k 0
N mod 2 n+m
. The comparison with the decomposition of g e presented
2 (x
k=0 k0 =0 in Equation (B1) clearly shows that windowed exponen-
i
As kN g 2 + k 0 N is a multiple of N , the obtained state is tiation divides the number of multiplications by we .
10

(a) Multiplication for the exponentiation, with i = 2 and we = 2, decomposed into product-additions.





|ei Input ei:i+we Input ei:i+we Input ei:i+we

 =

 i

+x̄ −g −2 ei:i+we
i
|xi / ×g 2 ei:i+we
mod N |xi / Input x mod N

i
|0i / +xg 2 ei:i+we
mod N Input x̄ h0|

(b) Windowed product-addition, as needed in (a), with the windows size wm = 3.


we we
/ Input ei:i+we / Input ei:i+we Input ei:i+we

Input x0:3

Input x
=
Input x3:6

i i i
/ +xg 2 ei:i+we
mod N / +20 x0:3 g 2 ei:i+we
mod N +23 x3:6 g 2 ei:i+we
mod N

(c) Modular addition of a number read with a table lookup, as needed in (b).
we + wm we + wm
/ Input k / Input k Input k
n |Tk i
= |0i / Load Tk Input Tk Unload Tk |0i

/ +Tk mod N / +Tk mod N

Figure 9. Windowed arithmetic subcircuits for the modular exponentiation. When not specified, the register size is n+m qubits
(register encoded into the coset representation of integers).

As for the standard algorithm, the multiplications Loading a value Tk into a quantum register is done
of the product (D4) are implemented successively and using a quantum table lookup circuit which we discuss
each multiplication is decomposed into a sequence of two right after. The subsequent subsection is dedicated to
product-additions, as shown in Figure 9a. The difference the task aiming to unload the value Tk and reset the
is that the added number now depends on the number register in state |0i. The last subsection is dedicated to
ei:i+we . the requested addition.

The product-addition is also performed in a windowed


way [20]. Figure 9b shows in particular how the first
product-addition needed for each multiplication is per- 2. Table lookup
formed using windows for input x of size wm = 3.

Figure 9c finally shows the implementation of an addi- The quantum table lookup proposed in [41], pro-
tion +Tk mod N of a quantity Tk that depends on the duces the following operation on basis states: |ki |xi 7→
value k. It requires three steps. First, the number Tk is |ki |x ⊕ Tk i with ⊕ the bitwise XOR operator. For state
loaded into an ancillary register. Second, this number is preparation, as required in the first step of the opera-
unconditionally added to the desired register and finally tion presented in Figure 9c, the target register starts in
the ancillary register is cleaned up. Note that the value
i
the state |0i such that control and target registers end
of Tk (given by Tk1 ,k2 = 2i k1 g 2 k2 , with k1 = ei:i+we and in |ki |Tk i. The circuit presented in Figure 10 shows the
k2 = xi:i+wm , k being the concatenation of k1 and k2 ) to principle of this operation with registers for k and Tk
be added is known classically. Its addition being realized composed respectively of 3 and 5 qubits.
modulo N , its value can be computed modulo N before
being loaded. n bits are thus sufficient to encode Tk . Concretely, the numbers Tk specify the set of controlled
11

k2 • • • • • • ural way to do this mapping is to apply again the lookup


k1 • •
• • • • • • • • • • • • operation described in the previous subsection. Since
k0 • • • • the lookup operates on the computational basis follow-
• • • • • • • • ing |ki |xi 7→ |ki |x ⊕ Tk i where ⊕ standsP for the bit-
wise XOR operator, by linearity it maps αk |ki |Tk i 7→
? ? ? ? ? ? ? ?
|0i P k
|0i
? ? ? ? ? ? ? ? αk |ki |0i, the latter corresponding to the desired state
? ? ? ? ? ? ? ? k
|0i when simply discarding the qubits previously encoding
? ? ? ? ? ? ? ?
|0i the numbers Tk .
? ? ? ? ? ? ? ?
|0i
T7 T6 T5 T4 T3 T2 T1 T0 However, a more efficient measurement-based tech-
nique is possible, as shown in Ref. [42, Appendix C] and
Figure 10. Example of a quantum table lookup. For a basis
state |ki specifying the address of the number Tk from a clas- improved in Ref. [20]. The principle consists in starting
sical table, the quantum table lookup maps basis states |ki |0i by measuring the register encoding Tk in the X basis be-
into |ki |Tk i. Here k and the output are composed respectively fore applying a phase shift conditioned on the result of
of 3 and 5 qubits. The notations for the AND computation measurements. For a more detailed explanation, let us
and uncomputation is presented in Figure 8. Black and white start to expand the qubits encoding the numbers Tk in
circles are controls on the |1i and |0i states respectively. The bits indexed by j ((Tk )j being the jth bit of Tk ). The
question mark on the controlled NOT means that a controlled
state before the uncomputation can be written as
NOT is applied on qubit i only when the ith bit of Tk takes
O E
the value 1.
X
αk |ki (Tk )j . (D5)
j
k j

NOT to be used (the question mark on the controlled Let us now focus on a specific qubit indexed by j ∗ . We
NOT means that a controlled NOT is applied on qubit label K0 = {k | (Tk )j ∗ = 0} and K1 = {k | (Tk )j ∗ = 1}.
i only when the ith bit of Tk takes the value 1). The The state before the uncomputation can be rewritten as
circuit operating on the bits ki of k prepares the last an-  
cillary qubit (line 5 from the top) in the state |1i at the X O E
time (specified by k) where the gates corresponding to
 αk |ki (Tk )j  |0ij ∗
j
Tk are applied, and |0i otherwise. The building block k∈K0 j6=j ∗
 
of the circuit is boxed in Figure 10. It uses 1 CNOT, 1 X O E
AND computation and uncomputation. Given that k is + αk |ki (Tk )j  |1ij ∗ . (D6)
j
encoded into the number of bits we + wn and can thus k∈K1 j6=j ∗
take 2we +wn different values, the number of blocks in
Pm −1 j
we +w By applying a Hadamard gate on the j ∗ th qubit, we ob-
the upper part of Figure 10 is given by 2 = tain
j=1
O E
2we −wn − 2. This means that 2we +wm − 2 CNOT gates,
 X 
αk |ki (Tk )j
2we +wm − 2 AND computations and uncomputations are 1 
 k∈K0 j6=j ∗
j 
√  E  |0ij ∗

needed to implement these blocks. Moreover, the num- 2 +
X O
ber of controlled multi-NOT gates to load the value Tk α |ki k (T )  k j
j
k∈K1 j6=j ∗
is given by 2we +wm , each gate being decomposed into  X O E 
n/2 CNOT in average since Tk takes n bits. When αk |ki (Tk )j
including the (two) NOT gates operating on the high- j
1 
 k∈K0 j6=j ∗

+√   |1ij ∗ . (D7)

est bit of k, we conclude that the table lookup uses 2 −
X O E
2 NOT gates, 2we +wm − 2 + 2we +wm −1 n CNOT gates, αk |ki (Tk )j 
j
k∈K1 j6=j ∗
2we +wm −2 AND computations and uncomputations (cor-
responding to 2 × (2we +wm − 2) Toffoli gates). Hence, if the measurement of the j ∗ th qubit yields 0, the
qubit is properly uncomputed. If the result is 1, a phase
shift needs to be applied on states corresponding to the
indexes k ∈ K1 .
3. Table unlookup
This uncomputation is successively applied to all the
qubits encoding the numbers Tk . Let tj be the mea-
The purpose of the table unlookupPoperation (last surement result of the jth qubit. The state after all the
step in Figure 9c) is to map the state αk |ki |Tk i into measurements is given by
k X
P
αk |ki, where αk are some complex coefficients. A nat- αk σk |ki , (D8)
k k
12
Q tj (Tk )j
with σk = (−1) . We now label k:s in an ancillary register in the unary representation: a
j register with 2s qubits representing a number k:s with the
state of the qubit number k:s being |1i and all the other
K = {k | σk = −1}. (D9)
qubits in the state |0i. The qubits in state |ks: i and the
In order to recover the desired state, we need to correct ancillary qubits are then used as control and target qubits
selectively the phase of terms |ki for which k ∈ K. for a lookup circuit where the controlled multi-NOT gates
are replaced by controlled multi-Z gates. Finally, the
s
|k:s i / Input k:s Input k:s ancillary register is uncomputed. The circuit used to
we + wm − s initialize the ancillary register in shown in Figure 12a.
|ks: i / Input ks: The one used to put it back in its initial state is given
2s in Figure 12b.
Init unary / H ⊕Fks: H Deinit unary

Figure 11. Representation of the four steps proposed in Starting from x encoded in s qubits, the conversion to
Ref. [20] to selectively change the phase of components |ki the unary representation takes 1 NOT gate, 2s −1 CNOT
in the state given in (D8) when the index k belongs to gates and 2s − 1 AND computation. The conversion
K (D9). The central operation is a table lookup with the back to the binary representation takes 1 NOT gate,
s

values Fks: =
2P −1
2j δ(j + 2s ks: ) where δ() is the indicator
2s − 1 CNOT gates and 2s − 1 AND uncomputation.
j=0 Given that k is encoded in we + wm bits, and that a
function of K. choice s = we +w

2
m
is judicious to minimize the num-
ber of gates, the change of phase of components |ki
takes 2b 2 c+1 + 4 single qubit gates, 2we +wm −1 +
we +wm
(a) Binary to unary (b) Unary to binary
2b w +w c+1 + 2d we +w e − 4 CNOTs and 2b we +w c+
we +wm m m
conversion conversion 2 2 2

 •

 • 2d e
2
m
e − 3 ANDs (1 NOT gate, 2 b we +wm
2 c − 1 CNOT
|xi  • • |xi  • •
gates and 2b 2 c − 1 AND computation for the
we +wm
• • • • • • • •
unary conversion, 2 × 2b 2 c Hadamard gates around
we +wm
|0i • • • • • • |0i
the table lookup, 2 NOT gates, 2d 2 e − 2 +
we +wm
• • • • • •
2we +wm −1 CNOT gates, 2d 2 e − 2 AND computa-
we +wm
• • • •
• • • • tions and uncomputations for the lookup circuit and
1 NOT gate, 2b 2 c − 1 CNOT gates and 2b 2 c −
we +wm we +wm
• •
• •
• • 1 AND uncomputation for the binary conversion). In-
• • cluding the additional n Hadamard gates and n mea-
Figure 12. (a): representation of the circuit proposed in surements on Tk , we conclude that the table unlookup
takes 2b 2 c+1 + n + 4 single qubit gates, 2we +wm −1 +
we +wm
Ref. [20] for preparing a copy in a ancillary register of an inte-
ger x in a unary representation starting from an encoding of
2b w +w c+1 + 2d we +w e − 4 CNOTs and 2b we +w c+
we +wm m m
2 2 2
x in a control register in the binary representation. The first
2d e
2
m
e − 3 ANDs.
not operation prepares the first qubit in the ancillary regis-
ter in state |1i. The first AND computation writes the result
of an AND operation between the first bit of x and the bit
1 encoded in the first qubit of the ancillary register into the
second qubit of the ancillary register. In case the state of the 4. Standard adder
latter is |1i, the state of the first qubit of the ancillary regis-
ter is changed to |0i. The combination of AND and CNOT
operations is successively repeated until the desired qubit of
As we use the coset representation of integers with win-
the ancillary register is in state |1i. (b): representation of the
circuit proposed in Ref. [20] to erase the value in the ancillary dowed arithmetic operations, a circuit for unconditional
register while keeping the integer x into the control register. addition modulo a power of two is sufficient to imple-
The circuits (a) and (b) corresponds to the first and third op- ment a modular addition. The adder we use, which is
erations needed for the selective phase correction operation described in [38] and optimized from [37] for use with
presented in Figure 11. T gates, is presented in Figure 13. It is thrifty in gate
number and ancillary qubits, at the cost of being deeper
The selective phase correction is done in four steps [20], than other circuits [43, 44], which is not a disadvantage
as shown in Figure 11. First, the control register which for our architecture.
uses we + wm qubits in state |ki is split in two groups.
The first group is made with s qubits in state |k:s i. The As presented in Figure 9c, the adder needs to add
second group takes the remaining we + wm − s qubits in a number Tk taking n qubits into a register with n +
state |ks: i, such that |ki = |ks: i ⊗ |k:s i and k = k:s + m qubits. To achieve this, either the first register for Tk
2s ks: . The second step consists in writing the integer is extended with qubits in the |0i state, either we use
13
x0 • • • x0
y0 • • (y + x)0 that the swap operation is realized by simply relabel-
• • • • ing the register, hence is for free. According to the
x1 • • • x1 counts obtained from previous subsections, the cost of
y1 • • (y + x)1 the exponentiation is dominated — in the limit n → ∞,
ne = O(n), we and wm constant — by: 2 new (n+m)n
e wm
one
• • • • qubit gates, (2we +wm n + 12(n + m)) new(n+m) CNOTs,
x2 x2 e wm
• • • 2

y2 • • (y + x)2 and 2 new(n+m)


e wm
AND computations and uncomputations
2
• (translatable into 4 new(n+m)
e wm
Toffoli gates). Note that
x3 • x3 when considering the universal gate set T , S, H, X, Y ,
y3 (y + x)3 Z, CNOT, controlled-Z and their conjugate, according to
Fig. 4 of Ref. [41] the AND computation and uncomputa-
Figure 13. Adder modulo 24 from [38], using the same nota-
tions as in Figure 8. The building block (boxed) is repeated tion costs in average 8 one qubit gates and 3.5 two qubits
two times, for the qubits numbers 1 and 2, while the first and gates. The total cost of the exponentiation is hence given
last use a simplified subcircuit. at the leading order by 2 new (n+m)n
e wm
(9n + 8m) one qubit
gates and (2we +wm n + 19(n + m)) new(n+m)
e wm
two qubit
gates. In the code used to compute the required resources
carry propagation blocs for the last qubits. Such blocs
and find the optimal parameters, the complete formula
are identical to the ones of semi-classical adder with clas-
have been used [21].
sical input 0; see [3, Fig. 17] for an example of such a
circuit. For gate counting, the first solution is taken into
account.

The cost of the addition circuit (Figure 13) is 6(n + Appendix E: Error correction
m) − 9 CNOT gates and n + m − 1 AND computations
and uncomputations.
This appendix is dedicated to 3D gauge color codes.
The first subsection is dedicated to the principle of sub-
system codes. The second subsection describes the ge-
5. Cost estimation
ometrical structure of 3D gauge color codes. The last
subsection provides a detailed description of the cut of
the code structure that is used to process and correct the
In summary, the parameters of the logical circuit for logical qubits.
computing the modular exponentiation are

n number of bits of the exponentiated number g


1. Subsystem codes
ne number of bits of the exponent e

we window size for the exponentiation


Subsystem stabilizer codes [46] are defined by three
wm window size for the multiplication subgroups of the Pauli group: the stabilizer, gauge and
m number of qubits added by the coset representation logical (also designated as bare logical in [22]) operator
groups, such that the stabilizer group is the center of the
gauge group up to phases, i1 is included in the gauge
The aim of this subsection is to give an estimate of the group, the operators from the gauge and logical groups
number of gates needed to implement this circuit. In or- commutes, and the normalizer of the stabilizer group is
der to keep the evaluation independent of the error cor- the product of gauge and logical groups. We invite the
rection choice, we express the cost in terms of the number reader to look at Refs. [22, 46] for an explicit construction
of single qubit, two qubit gates and AND computation of those groups from canonical generators of the Pauli
and uncomputation [45]. group. The stabilizer group plays the standard role of
stabilizers, i.e. divides the total Hilbert space H into a
The modular exponentiation consists in ne /we multi- direct sum of orthogonal subspaces C ⊕ C ⊥ where C —
plications, each multiplication using 2 product addition the stabilized subspace — corresponds to the eigenspace
and a swap and each product addition is implemented +1 of all stabilizers. The gauge and logical groups de-
with (n+m)/wm lookups, additions and unlookups. Note compose the stabilized subspace C into a tensor product
14

of the logical qubits space A and the gauge qubits space The measured operators are the gauge generators for
B [47], that is, the Hilbert space is decomposed as the code used to implement the H and CNOT gates
— the (1, 1) code (see [22]). These generators are de-
H = (A ⊗ B) ⊕C ⊥ . scribed by the edges: each operator is the product of X
| {z }
C or Z operators of the elementary tetrahedrons adjacent
to a given edge (each operator implies up to 6 physical
The gauge group acts trivially on the logical qubits and qubits).
is the Pauli group of the gauge qubits while the logical
group acts trivially on the gauge qubits and is the Pauli The stabilizer generators of the (1, 1) and (1, 2) codes
group of the logical qubits (up to phases). This ensures (the (1, 2) code refers to the code used to implement the T
that gauge operator measurements don’t modify the log- gate [22]) which are described by the vertices and edges,
ical qubits. are deduced from the values of measured operators. More
precisely, the operator corresponding to a vertex can be
A gauge fixing operation consists in switching from a written as the product of the operators corresponding to
code to another one such that the new stabilizer group edges starting at the given vertex and ending on vertices
includes the original one while being included into the of a common color. Three choices of color are possible,
original gauge group, while keeping unmodified the logi- allowing one to recover in three different ways an opera-
cal group. The decomposition associated to the original tor corresponding to a vertex. This redundancy can be
code used for achieving fault-tolerant error correction in only
one measurement of the (gauge) operators related to the
H = (A ⊗ B) ⊕C ⊥
| {z } edges [22].
C

then becomes of the form Let ncode be the index of the code which is the number
of vertices of the same color on one edge of the large
H = (A ⊗ B 0 ) ⊕ (A ⊗ B 00 ) ⊕ C ⊥ tetrahedron (denoted as n in [22]). The code distance
| {z } is given by d = 2ncode + 1, and the number of physical
C 0⊥ 3
qubits is 1 + 4ncode + 6n2code + 4n3code = d 2+d [22, 48].
where B 0 is the new gauge qubit space. As a consequence,
a valid code-word for this new code is also valid for the
initial one. The passage of the latter to the new code
is done by measuring the generators of the gauge group, 3. Slicing of the code structure
the results of these measurements giving the correction
to apply on B 0 ⊕ B 00 to remove the components on B 00 .
To process the information, the code structure is de-
For 3D gauge color codes, code switching allows a composed into slices, each slice being map successively
transversal error-corrected implementation of a univer- into the 2D processor. While several cuts in slices are
sal set of gates [22]. possible, we choose slices orthogonal to two faces (see
Figure 14). The processor need to be sized to fit in the
larger slice, that join the edge not included into any of
the two faces to the middle of the opposing edge — the
2. Code geometrical structure magenta slice in Figure 14.

With the lattice described in Ref. [22], the central slice


The geometrical structure of the 3D gauge color codes corresponds to the elementary tetrahedrons for which all
is described in detail in Section 3.1 of [22]. It takes a large vertices coordinates satisfy x + z = ncode − 2 or x + z =
tetrahedron, itself decomposed into elementary tetrahe- ncode − 1 (the elementary tetrahedrons between the two
drons, see Figure 14 for an example. Four extra points plans defined by the previous equations). Note that the
(vi , i ∈ {1, 2, 3, 4}) are then added outside the large tetra- number of slices is given by d − 2.
hedron, one point in front of each facet of the large tetra-
hedron. Elementary tetrahedrons are finally added be- The number of elementary tetrahedrons included in
tween those extra points and the vertices at the surface this slice is counted by considering three tetrahedron sets,
of the large tetrahedron, see Fig. 4b of [48] for an illustra- see Figure 14 and Figure 15. Two sets correspond to the
tion. The vertices of elementary tetrahedrons are colored elementary tetrahedrons having a facet at the interplay
with 4 different colors such that adjacent vertices get a between two slices (Figure 15a (Figure 15b) is associated
different color. Each elementary tetrahedron represents to the elementary tetrahedrons with one facet are the in-
a physical qubit. terplay between the magenta slice and the green (cyan)
15

slice). The last set is associated to the elementary tetra-


(a) First set (b) Second set (c) Third set
hedrons having no facet at the interplay between two
slices (Figure 15c). One can check that the two first sets
2ncode−2
k = 2n2code − 3ncode + 1 elementary tetra-
P
include
k=1
P−2
ncode
hedrons while the last set has (2ncode −1)+ 2(2k +
k=0
1) = 2n2code − 2ncode + 1 elementary tetrahedrons. They
are 16ncode − 2 additional elementary tetrahedrons re-
sulting from the 4 added points in the construction of
the code. In total, the maximum number of elemen-
tary tetrahedron for one slice of the code structure is
6n2code + 8ncode + 1. Since we consider a processor that
can process up to two slices (associated to two different
logical qubits) and accounting for the ancillary subsys-
tems needed to measure the gauge generators by a sim- Figure 15. Decomposition of the central slice for ncode = 3
ple factor of two, we obtain the number of physical qubits (magenta slice in the tetrahedron presented in Figure 14).
in the processor specified in the main text. For more de- Each subfigure corresponds to a set of elementary tetrahe-
drons of the central slice, seen from different point of views.
tails, see the ancillary file tetrahedron_3_bis.scad [21],
On (a) and (b), each triangle corresponds to an elementary
where each tetrahedron color corresponds to a given slice, tetrahedron. On (c) each small rectangle correspond to an
the larger being the magenta one. elementary tetrahedron.

1. Optimal parameters to factor n-bit RSA integers

The resources and parameters needed to factor RSA


integers encoded in n bits are specified in Table I. In
particular, we consider the factorization of RSA integers
with n = 6 bits, the number of bits needed to factor
35. We also consider n = 829 which corresponds to the
largest RSA integer factorized so far [49].

Figure 14. Code geometrical structure for ncode = 3 (with-


out the extra points (vi , i ∈ {1, 2, 3, 4}). Each slice has been 2. Trade-off between qubits and runtime
represented with a specific color. The larger slice is with
the magenta elementary tetrahedrons. The figure shows that
the maximum number of slices involved in an operator corre-
sponding to an edge is 2. We have estimated that an average runtime of 177 days
is needed to factor a 2 048-bit RSA number. There are
several ways to reduce this number, most of them coming
at the cost of using more qubits in the processor. The
Appendix F: Results and possible improvements items below present several ways separately.

• Due to the tetrahedral geometry of the code struc-


We presented in the main text the resources needed ture, only one third of the processor qubits are used
to factor 2 048 RSA integers corresponding to the most during the error correction steps in average. A fac-
common RSA key size. In the first subsection of this ap- tor 3 in time could thus be saved by making use of
pendix, we discuss the factorization of RSA integers of them.
various sizes. The second subsection is dedicated to a
discussion on ways to reduce the runtime to factor RSA • The logical circuit can be parallelized in several
integers and in particular, on the trade-off between the ways, giving a speed-up roughly proportional to the
number of physical qubits in the processor and the run- increase in qubit numbers in the processor. More
time. precisely:
16

n ne m we wm d nqubits texp logical qubits total modes spatial modes temporal modes all memory correction
6 6 4 3 2 7 316 1 min 38 6 650 3 002 5 95 µs
829 1 242 26 3 3 41 10 244 11 days 3 396 117 097 476 8 697 156 39 66 ms
2 048 3 029 30 3 3 47 13 436 177 days 8 284 430 229 540 27 825 956 45 186 ms

Table I. For different integer sizes n and corresponding exponent size ne (∼ 1.5n), the table presents the optimal set of
parameters, processor size and computation runtime, and the memory requirements.

– Some operations in the adder can be paral- in a very high absorption coefficient α = 4.0 m−1 . If
lelized (see Figure 13). The controlled NOT we assume a L = λ/2 cavity, unit absorption and re-
operations aligned vertically can be applied at emission efficiencies are obtained if the quality factor is
the same time. Q = F = 2π/(αλ) ≈ 26 for a 5 GHz cavity. In this
– The runtime is dominated by the time spent low-Q regime, κ  Γ and a coherence time of a few hun-
to implement the CNOT gates of the quan- dreds of microseconds would translate into a multi-mode
tum lookup circuit (see Figure 10) and they capacity of a few tens of modes. By working with crys-
are easily parallelizable. A full parallelization, tals having lower doping concentrations, the coherence
would reduce the factorization of 2 048 RSA time can be significantly increased [50], while still reach-
integers to about 27 days, at the cost of using ing the impedance matching point with low-Q resonators.
about 12 millions qubits in the processor. In this case, a few thousand modes might realistically be
stored very efficiently. Rare-earth doped materials is not
– Oblivious carry runways allows parallelization
the only option and other candidates such as negatively
of the adders [36].
charged nitrogen vacancy color centers in diamond [51] or
– Other type of adders could exploit fur- bismuth donors in silicon [52] may be even more promis-
ther parallelizations, for instance lookahead ing.
adders [44]
– During a product-addition operation, the dif-
ferent additions can be parallelized by com-
puting separately partial sums. ∗
[email protected]
– During the exponentiation, the different mul- †
https://quantum.paris
tiplications can be parallelized by computing [1] M. Kjaergaard, M. E. Schwartz, J. Braumüller,
separately partial products. P. Krantz, J. I.-J. Wang, S. Gustavsson, and W. D.
Oliver, Superconducting qubits: Current state of play,
• The qubit number can be reduced using another Annual Review of Condensed Matter Physics 11, 369
slicing of the code structure, at the cost of a longer (2020), 1905.13641.
computation time. For example, if one chooses to [2] C. Gidney and M. Ekerå, How to factor 2048 bit rsa inte-
cut the tetrahedron by slices parallel to a facet of gers in 8 hours using 20 million noisy qubits, 1905.09749
this tetrahedron, we estimate that a 2 048-bits RSA (2019).
[3] Y. R. Sanders, D. W. Berry, P. C. S. Costa, N. Wiebe,
integer could be factorized with 6 628 qubits in the C. Gidney, H. Neven, and R. Babbush, Compilation of
processor and 354 days. fault-tolerant quantum heuristics for combinatorial opti-
mization, 2007.07391 (2020).
[4] J. Lee, D. Berry, C. Gidney, W. J. Huggins, J. R. Mc-
Clean, N. Wiebe, and R. Babbush, Even more efficient
Appendix G: Realization combining a Rare-earth quantum computations of chemistry through tensor hy-
doped solid and a superconducting resonator percontraction, 2011.03494 (2020).
[5] D. Kielpinski, C. Monroe, and D. J. Wineland, Architec-
ture for a large-scale ion-trap quantum computer, Nature
For implementing a multimode memory with a spin- 417, 709 (2002).
echo technique, materials doped with rare-earth, such [6] D. D. Thaker, T. S. Metodi, A. W. Cross, I. L. Chuang,
and F. T. Chong, Quantum memory hierarchies: Effi-
as Erbium Er3+ provide an appealing example since
cient designs to match available parallelism in quantum
these ions have doubly-degenerate Zeeman states which computing, in 33rd International Symposium on Com-
split when an external magnetic field is applied. Sev- puter Architecture (ISCA’06) (IEEE, 2006) pp. 378–390,
eral manuscripts have reported on the successful cou- quant-ph/0604070.
pling between the crystal Er3+:Y2SiO5 and a supercon- [7] Z.-L. Xiang, S. Ashhab, J.-Q. You, and F. Nori, Hybrid
ducting microwave resonator [32–34]. Ref. [34] in par- quantum circuits: Superconducting circuits interacting
ticular reported on √
the strong coupling with a collec- with other quantum systems, Reviews of Modern Physics
85, 623 (2013), 1204.2137.
tive coupling rate g N̄ = 2π × 34 MHz and an inho-
mogeneous linewidth Γ = 2π × 12 MHz. This results
17

[8] G. Kurizki, P. Bertet, Y. Kubo, K. Mølmer, D. Pet- Matter , Ph.D. thesis (2018).
rosyan, P. Rabl, and J. Schmiedmayer, Quantum tech- [27] A. Kubica, M. E. Beverland, F. Brandão, J. Preskill, and
nologies with hybrid systems, Proceedings of the National K. M. Svore, Three-dimensional color code thresholds via
Academy of Sciences 112, 3866 (2015), 1504.00158. statistical-mechanical mapping, Physical Review Letters
[9] C. Grezes, Y. Kubo, B. Julsgaard, T. Umeda, J. Isoya, 120, 180501 (2018), 1708.07131.
H. Sumiya, H. Abe, S. Onoda, T. Ohshima, K. Naka- [28] S. J. Devitt, W. J. Munro, and K. Nemoto, Quantum
mura, I. Diniz, A. Auffeves, V. Jacques, J.-F. Roch, error correction for beginners, Reports on Progress in
D. Vion, D. Esteve, K. Mølmer, and P. Bertet, Towards Physics 76, 076001 (2013), 0905.2794.
a spin-ensemble quantum memory for superconduct- [29] M. Afzelius, N. Sangouard, G. Johansson, M. U. Staudt,
ing qubits, Comptes Rendus Physique 17, 693 (2016), and C. M. Wilson, Proposal for a coherent quantum
1510.06565. memory for propagating microwave photons, New Jour-
[10] H. Bombı́n and M. A. Martin-Delgado, Exact topologi- nal of Physics 15, 065008 (2013), 1301.1858.
cal quantum order in d = 3 and beyond: Branyons and [30] M. Afzelius and C. Simon, Impedance-matched cavity
brane-net condensates, Physical Review B 75, 075103 quantum memory, Physical Review A 82, 022310 (2010),
(2007), cond-mat/0607736. 1004.2469.
[11] P. W. Shor, Algorithms for quantum computation: dis- [31] T. Chanelière, G. Hétet, and N. Sangouard, Quantum op-
crete logarithms and factoring, in Proceedings 35th An- tical memory protocols in atomic ensembles, in Advances
nual Symposium on Foundations of Computer Science In Atomic, Molecular, and Optical Physics, Vol. 67 (El-
(IEEE Comput. Soc. Press, 1994) pp. 124–134. sevier, 2018) Chap. 2, pp. 77–150, 1801.10023.
[12] M. Ekerå and J. Håstad, Quantum algorithms for com- [32] P. A. Bushev, A. K. Feofanov, H. Rotzinger, I. Pro-
puting short discrete logarithms and factoring RSA in- topopov, J. H. Cole, C. M. Wilson, G. Fischer, A. V.
tegers, in Post-Quantum Cryptography, Lecture Notes in Lukashenko, and A. V. Ustinov, Ultralow-power spec-
Computer Science, Vol. 10346, edited by T. Lange and troscopy of a rare-earth spin ensemble using a supercon-
T. Takagi (Springer International Publishing, 2017) pp. ducting resonator, Physical Review B 84, 060501 (2011),
347–363, 1702.00249. 1102.3841.
[13] R. L. Rivest, A. Shamir, and L. Adleman, A method [33] M. U. Staudt, I.-C. Hoi, P. Krantz, M. Sandberg,
for obtaining digital signatures and public-key cryptosys- M. Simoen, P. A. Bushev, N. Sangouard, M. Afzelius,
tems, Communications of the ACM 21, 120 (1978). V. S. Shumeiko, G. Johansson, P. Delsing, and C. M. Wil-
[14] W. Diffie and M. E. Hellman, New directions in cryp- son, Coupling of an erbium spin ensemble to a supercon-
tography, IEEE Transactions on Information Theory 22, ducting resonator, Journal of Physics B: Atomic, Molec-
644 (1976). ular and Optical Physics 45, 124019 (2012), 1201.1718.
[15] Information Technology Laboratory, Digital signature [34] S. Probst, H. Rotzinger, S. Wünsch, P. Jung, M. Jerger,
standard (DSS) (2013). M. Siegel, A. V. Ustinov, and P. A. Bushev, Anisotropic
[16] P. W. Shor, Polynomial-time algorithms for prime factor- rare-earth spin ensemble strongly coupled to a supercon-
ization and discrete logarithms on a quantum computer, ducting resonator, Physical Review Letters 110, 157001
SIAM Journal on Computing 26, 1484 (1997), quant- (2013), 1212.2856.
ph/9508027. [35] R. B. Griffiths and C.-S. Niu, Semiclassical fourier trans-
[17] M. Ekerå, Modifying shor’s algorithm to compute short form for quantum computation, Physical Review Letters
discrete logarithms (2016), https://eprint.iacr.org/ 76, 3228 (1996), quant-ph/9511007.
2016/1128. [36] C. Gidney, Approximate encoded permutations and
[18] M. Ekerå, On post-processing in the quantum algorithm piecewise quantum adders, 1905.08488 (2019).
for computing short discrete logarithms (2017), https: [37] S. A. Cuccaro, T. G. Draper, S. A. Kutin, and D. P.
//eprint.iacr.org/2017/1122. Moulton, A new quantum ripple-carry addition circuit,
[19] M. Ekerå, Quantum algorithms for computing general quant-ph/0410184 (2004).
discrete logarithms and orders with tradeoffs (2018), [38] C. Gidney, Halving the cost of quantum addition, Quan-
https://eprint.iacr.org/2018/797. tum 2, 74 (2018), 1709.06648.
[20] C. Gidney, Windowed quantum arithmetic, 1905.07682 [39] V. Vedral, A. Barenco, and A. Ekert, Quantum networks
(2019). for elementary arithmetic operations, Physical Review A
[21] The code is available at https://github.com/ 54, 147 (1996), quant-ph/9511018.
ElieGouzien/factoring_with_memory. [40] C. Zalka, Shor’s algorithm with fewer (pure) qubits,
[22] H. Bombı́n, Gauge color codes: optimal transversal gates quant-ph/0601097 (2006).
and gauge fixing in topological stabilizer codes, New [41] R. Babbush, C. Gidney, D. W. Berry, N. Wiebe, J. Mc-
Journal of Physics 17, 083002 (2015), 1311.0879. Clean, A. Paler, A. Fowler, and H. Neven, Encoding elec-
[23] E. T. Campbell, B. M. Terhal, and C. Vuillot, Roads tronic spectra in quantum circuits with linear t complex-
towards fault-tolerant universal quantum computation, ity, Physical Review X 8, 041015 (2018), 1805.03662.
Nature 549, 172 (2017), 1612.07330. [42] D. W. Berry, C. Gidney, M. Motta, J. R. McClean, and
[24] H. Bombı́n, Single-shot fault-tolerant quantum error cor- R. Babbush, Qubitization of arbitrary basis quantum
rection, Physical Review X 5, 031043 (2015), 1404.5504. chemistry leveraging sparsity and low rank factorization,
[25] B. J. Brown, N. H. Nickerson, and D. E. Browne, Fault- Quantum 3, 208 (2019), 1902.02134.
tolerant error correction with the gauge color code, Na- [43] T. G. Draper, Addition on a quantum computer, quant-
ture Communications 7, 12302 (2016), 1503.08217. ph/0008033 (2000).
[26] A. M. Kubica, The ABCs of the Color Code: A Study [44] T. G. Draper, S. A. Kutin, E. M. Rains, and K. M. Svore,
of Topological Quantum Codes as Toy Models for Fault- A logarithmic-depth quantum carry-lookahead adder,
Tolerant Quantum Computation and Quantum Phases Of Quantum Information and Computation 4-5, 351 (2006),
18

quant-ph/0406142. (2020).
[45] Due to the measurement-based uncomputation, it is often [50] M. Le Dantec, M. Rancic, E. Flurin, D. Vion, D. Es-
more efficient to implement the non-Clifford operations teve, P. Bertet, P. Goldner, T. Chanelière, B. Sylvain,
through AND computation. In case of direct implemen- S. Lin, and R. B. Liu, Twenty millisecond electron-spin
tation of Toffoli gates, the circuit cost could be slightly coherence in an erbium doped crystal, in Bulletin of the
reduced. American Physical Society (American Physical Society,
[46] D. Poulin, Stabilizer formalism for operator quantum 2021).
error correction, Physical Review Letters 95, 230504 [51] Y. Kubo, C. Grezes, A. Dewes, T. Umeda, J. Isoya,
(2005), quant-ph/0508131. H. Sumiya, N. Morishita, H. Abe, S. Onoda, T. Ohshima,
[47] P. Zanardi, D. A. Lidar, and S. Lloyd, Quantum tensor V. Jacques, A. Dréau, J.-F. Roch, I. Diniz, A. Auffeves,
product structures are observable induced, Physical Re- D. Vion, D. Esteve, and P. Bertet, Hybrid quantum
view Letters 92, 060402 (2004), quant-ph/0308043. circuit with a superconducting qubit coupled to a spin
[48] M. E. Beverland, A. Kubica, and K. M. Svore, The cost ensemble, Physical Review Letters 107, 220501 (2011),
of universality: A comparative study of the overhead of 1110.2978.
state distillation and code switching with color codes, [52] V. Ranjan, J. O’Sullivan, E. Albertinale, B. Albanese,
2101.02211 (2021). T. Chanelière, T. Schenkel, D. Vion, D. Esteve, E. Flurin,
[49] F. Boudot, P. Gaudry, A. Guillevic, N. Heninger, J. J. L. Morton, and P. Bertet, Multimode storage of
E. Thomé, and P. Zimmermann, Factorization of rsa-250 quantum microwave fields in electron spins over 100 ms,
Physical Review Letters 125, 210505 (2020), 2005.09275.

You might also like