Factoring 2048 RSA in 177 Days
Factoring 2048 RSA in 177 Days
Factoring 2048 RSA in 177 Days
temporal modes
for realizing quantum computers [1]. The standard ar-
chitecture consists in laying superconducting qubits in a
2D grid and making the computation using only neigh-
boring interactions. Recent estimations showed however
that fault-tolerant realizations of various quantum algo-
es
od
rithms with this architecture would require millions phys-
lm
ia
ical qubits [2–4]. These performance analyses naturally
at
sp
raise the question of an architecture better exploiting the
potential of superconducting qubits.
tioning of an actual quantum computer as its outcome mod N ) satisfies r > (p + q − 2)/2, computing the dis-
can be verified efficiently. Last but not least, the cost crete logarithm of h modulo N , as detailed later, yields
of its implementation has been evaluated using plausible d = (p + q − 2)/2. For large N , the assumption is ver-
physical assumptions for a large scale processor with a ified with a high probability [12]. Using N = pq and
standard 2D grid of superconducting qubits (a charac- d = (p + q − 2)/2, where N and d are both known, p
teristic physical gate error rate of 10−3 , a surface code and q can be recovered by choosing one of the solutions
cycle time of 1 µs, and a reaction time of 10 µs): it was of the second order equation N = p(2d + 2 − p), and then
estimated that it should be possible to factor a 2 048 bits exploiting q = 2d + 2 − p.
RSA integer in 8 hours with 20 millions qubits [2].
The discrete logarithm is computed in Ekerå and
By taking this cost estimation as a reference, we es- Håstad’s algorithm in three steps. First, the exponenti-
timate the approximate cost of implementing the same ation (e1 , e2 ) → g e1 h−e2 is applied once on two quantum
version of Shor’s algorithm in terms of physical process- registers prepared in a superposition of every possible
ing qubit number, multimode capacity, memory storage values of e1 and e2 respectively. Two quantum Fourier
time, and runtime. Our cost evaluation is given in the transforms are then applied independently to the two reg-
case where the processor is made with two (dressed) log- isters before being measured. Finally, a classical post-
ical qubit slices. Under the assumptions used in Ref. [2] processing extracts the discrete logarithm d of h modulo
for the gate error rate and the cycle time, we show that N from the measurement results. Because the measure-
it should be possible to factor 2 048 bits RSA integer in ments are performed directly after the Fourier transform,
177 days using a multimode memory with a storage time the cost of the exponentiation largely dominates the cost
of about 2 hours and a processor including 13 436 physical of Ekerå and Håstad’s algorithm, see Appendix A.
qubits — a reduction by more than 3 orders of magni-
tude of the number of physical qubit as compared to the Number of gates — The modular exponentiation
standard architecture without memory [2]. By insert- needed in Ekerå and Håstad’s algorithm, i.e. the opera-
ing additional error-correction steps, we show that the tion |ei |1i 7→ |ei |g e mod N i, with the input e and the
storage time can be significantly reduced at the cost of output g e mod N encoded on ne and n bits respectively,
a slight increase of the runtime. We also explain how can be decomposed into ne multiplications, each being
shorter runtimes and storage times are achievable at the decomposed into 2n controlled additions of integers of
cost of increasing the number of qubits in the processing typical size n and 1 controlled swap between two regis-
unit. We propose a realization of such an architecture ters of size n, giving a total number of 2ne n (ne ) con-
using a microwave interface between a processor made trolled additions (swaps between registers respectively),
with superconducting qubits and a multiplexed memory see Appendix B for a detailed explanation. Each addi-
using the principle of photon echo in solids doped with tion is a modular addition, which can be obtained with a
rare-earth ions embedded in cavities. standard adder circuit at the cost of a specific represen-
tation — the coset representation (see Appendix C) —
Principles of (a variant of ) Shor’s algorithm — We adding m additional qubits to the register. This means
consider the factorization of a number N = p × q con- that 2ne (n + m) controlled additions of integers of size
structed as the product of two prime numbers of similar n + m and ne controlled register swaps between regis-
sizes, p and q. We note n the number of bits involved ters of size n + m are needed for realizing Ekerå and
in the binary representation of N , that is 2n−1 ≤ N < Håstad’s algorithm. A controlled swap operation be-
2n . While no efficient classical factorization algorithm is tween two qubits can be performed using two CNOTs
known, Shor’s algorithm and its variants factor N with and one Toffoli gates. Hence, the total cost for controlled
a polynomial complexity into n [11, 12, 16–19]. swaps operating on two registers using n + m qubits is of
2(n+m) CNOTs and n+m Toffoli gates, see Appendix B.
The version of Shor’s factorization algorithm proposed For the controlled addition, we can use a semi-classical
by Ekerå and Håstad [12] starts by first selecting ran- adder whose mean cost for integers of size n + m is of
domly an integer g such that 1 < g < N and then 5.5(n + m) − 9 CNOTs and 2(n + m) − 1 Toffoli gates, see
checking that g falls in the multiplicative group of in- Appendix B. Given the number of gates in the controlled
tegers modulo N (g ∈ Z∗N ) using Euclid’s algorithm. If addition and swap operations, the number of additions
not, g is a prime factor of N . Once g is shown to be and swaps in the multiplication and the number of multi-
in Z∗N , the number h = g (N −1)/2 is computed. Given plications in the modular exponentiation, the cost of fac-
that the order of Z∗N is φ(N ) = (p − 1)(q − 1), we torization can easily be estimated, see Appendix B. This
have h = g (pq−p−q+1)/2 g (p+q−2)/2 ≡ g (p+q−2)/2 mod N cost can however be reduced using windowed arithmetic
where the last equivalence is the result of Chinese re- circuits [20]. The basic idea consists in grouping the bits
mainder theorem. Under the assumption that the order of e by blocks (in blocks including we bits) for control-
r of g (the smallest non-negative integer such that g r ≡ 1 ling each multiplication, hence reducing the number of
3
these multiplications. Similarly, for each multiplication, error probability on one logical qubit given in [25, eq. (4)]
the input bits are grouped (in blocks including wm bits)
to reduce the number of additions composing each mul- p
plogical = A exp α log dβ (1)
tiplication. As detailed in Appendix D, the cost of expo- pth
nentiation is dominated in this case by 2 new(n+m)n
e wm
one
where A ≈ 0.033, α ≈ 0.516, β ≈ 0.822, p is the error
qubit gates, (2we +wm n + 12(n + m)) new(n+m) CNOTs,
2
e wm probability per physical qubit, d the code distance which
and 4 new(n+m)
e wm
Toffoli gates. We emphasize that this is a is related to the number of physical qubits per logical
first order estimation. In the code used to compute the qubits (see below) and pth the fault-tolerance threshold.
required resources and find the optimal parameters, the We take pth = 0.75 %, a value reported in Ref. [26] us-
complete formulae have been used [21]. ing a projection decoder. Note that estimations suggest
that the threshold could be higher, with a more efficient
Error correction — The error correction is achieved decoder [27].
using 3D gauge color codes, a family of subsystem codes
presented in Ref. [10]. A first code admits a transversal Architecture — For simplicity, the tetrahedral struc-
implementation of CNOT and Hadamard gates while a ture of the error correction (see Appendix E) can be in-
second code accepts a transversal implementation of the cluded into a large cube in which the physical qubits
non-Clifford T gate. Switching between the two codes are now represented by elementary cubes, see Figure 1.
gives a universal set of gates without the need for state The large cubes are stored into the memory and loaded
distillation [22], contrary to standard ways of operating by slices into the processor when they need to be pro-
the surface code [23]. cessed. We size the processor such that one slice of two
large cubes can be loaded simultaneously, which is con-
venient to perform two qubits gates efficiently. Each gate
The two codes are based on a shared geometrical struc- is immediately followed by an error correction round on
ture: a large tetrahedron constructed from elementary the processed qubits. This is done by reloading again
tetrahedrons (see Appendix E for details). A physical each slice sequentially in the processor and by measur-
qubit is attributed to each elementary tetrahedron. As ing the gauge generators (before recovering classically
in any subsystem codes, the stabilized subspace is split the code stabilizers), each of them using up to 6 two-
into a tensorial product of the (bare) logical and gauge qubit gates, one auxiliary qubit and one measurement
qubits (the dressed logical qubits include the bare logical of this auxiliary [22, 28]. Note that the codes of inter-
qubits and the gauge qubits). A set of operators — gen- est are 3D-local and the auxiliary qubits only needs to
erators of gauge operators — are measured, each being keep coherence for the time of loading and measuring
the product of (up to six) X (or Z) operators associated two successive slices for performing successfully a sta-
to qubits corresponding to the tetrahedrons sharing the bilizer measurement. Once the syndromes are obtained
same edge. From these measurements, the values of sta- and the errors are detected, the correction of these errors
bilizers of the two codes are deduced. In the code used is delayed and merged with the next operation applied
for implementing the H and CNOT gates, the stabiliz- on the qubit to be corrected. Further note that all-to-
ers are defined from the vertices, i.e. the product of X all connectivity between the logical qubits is achieved if
(or Z) operators associated to qubits corresponding to each physical address in the memory can be mapped to
the tetrahedrons sharing the same vertex. In the code three physical qubits in the processor: two for the imple-
used for implementing the T gates, the stabilizers are de- mentation of two-qubit gates (depending on whether the
fined from the vertices for the X operators and from the physical qubit is involved in the logical controlled or tar-
edges for the Z operators. The value of an operator rep- get qubits) and one for performing the error correction.
resented by a vertex is classically recovered by multiply- For achieving a code distance d, corresponding to a cube
ing the measurement results of combinations of specific height d, the number of physical qubits in the processor
edges ending at the given vertex. Several combinations 2
is nqubits = 2×2× 3d +2d−3
2 , corresponding to two logical
are possible giving redundancies that can be exploited to
qubit slices (see Appendix E) and including the ancillary
achieve fault-tolerant error correction with only one run
qubits (essentially one per physical qubit) needed for the
of measurements [24]. The structure of codes in which the
stabilizer measurements. For a code distance d, we ap-
stabilized subsystem is the tensor product of the gauge
proximate the time it takes to perform one (single qubit
and (bare) logical subsystems guarantees that the mea-
or two qubit) logical gate by 2(d − 2)tc where tc is the
surements of gauge operators don’t reveal the value of
cycle time of the 2D processor (time to load one qubit
the (bare) logical qubit (see Appendix E).
slice, to measure the stabilizers, which is longer than the
gate operation, and to reload the slice into the memory)
To account for the additional resource needed to im- and the factor 2 comes from the fact that the gate is
plement these codes, we use an estimation of the residual immediately followed by an error correction round.
4
time
9
week to be stored. Note that the number of stored modes does
8
Expected time
7 not enter in the volume and is thus not optimized. Note
6 day
also that the qubit address in the memory can be iden-
5
tified by temporal indexes only at the cost of a longer
4 hour runtime when photon-echo type protocols are used, cf.
below for a concrete example.
3
grid and surface code). If instead of using a spatially and right after the Hadamard provided that the following
temporally multiplexed memory, the qubits are stored in phase gates are classically controlled by the result of this
the same spatial mode and are identified by (6 650) tem- measurement, see Figure 3c. In this case, the successive
poral addresses only, we evaluate the same factorization classically controlled phase gates operating on the same
to be possible in about 1 day using a memory bandwidth qubit can be merged together, leading to a circuit with
4Γ = 2π × 48 MHz and taking into account the corre- one phase gate, one Hadamard gate and one measure-
sponding deadtime between two memory readouts. In ment per qubit. When this semi-classical Fourier trans-
this case, error correction of all the qubits stored in the form operates on a register made with ne qubits (the
memory is estimated to take 132 ms meaning the stor- number of bits of the exponent), its cost is linear in ne
age time needs to be longer than 132 ms. For a memory and is thus negligible compared to the cubic complexity
bandwidth 4Γ = 2π × 120 MHz, the same factorization of the exponentiation.
would take 9 hours and error correction is estimated to
take 53 ms. As discussed in Appendix G, these require-
ments can realistically be met with a realization of the
quantum memory protocol described before combining a Appendix B: Decomposition of the exponentiation
solid doped with rare-earth and a superconducting mi- into elementary gates
crowave resonator [32–34].
Conclusion — We have shown that the use of a quan- In this appendix, we aim to give a clear view of how to
tum memory for quantum computing is appealing as un- decompose the modular exponentiation into elementary
processed qubits can be loaded into the memory which gates. The presented method is intended to be simple
significantly reduces the size of the processor. All-to-all to understand, but not optimal. A more efficient one is
connectivity between physical qubits can be obtained if presented in Appendix D.
each address in the memory can be mapped to arbitrary
qubits in the processor, hence offering many opportuni-
ties for error correction and for implementing algorithms
with gates operating between non-neighbouring qubits. 1. Decomposition of a modular exponentiation into
The effective improvement of the coherence time of un- additions
processed qubits is however not crucial, because error
correction is exponentially efficient, that is, a small in-
crease of the code distance can compensate a significant The modular exponentiation needed in Ekerå and
reduction of coherence time. Håstad’s algorithm, i.e. the operation |ei |1i 7→
|ei |g e mod N i, with the input e and the output g e
We acknowledge M. Afzelius, J.-D. Bancal, E. Flurin, mod N encoded on ne and n bits respectively, can be
P. Bertet, P. Sekatski, X. Valcarce and J. Zivy for stim- implemented from controlled modular additions as we
ulating discussions and/or for critically reviewing the show now. For simplicity, we omit the modulo in this
manuscript. We acknowledge funding by the Institut de paragraph.
Physique Théorique (IPhT), Commissariat à l’Énergie
Atomique et aux Energies Alternatives (CEA) and the Let ene −1 . . . ei . . . e0 be the binary form of e. The ex-
Region Île-de-France in the framework of DIM SIRTEQ. ponentiation can first be seen as a sequence of multipli-
cations
e −1
nY e −1 h
nY iei
i i
Appendix A: Semi-classical Fourier transform ge = g2 ei
= g2 (B1)
i=1 i=1
We here discuss the semi-classical Fourier transform where each multiplication is controlled by the bit value
presented in [35] and show that its cost is negligible. ei . Figure 4 shows an implementation of such a mul-
The standard way to perform the Fourier transform on tiplication in which a quantum register encoding the
i
ne qubits is shown in Figure 3a: it requires a sequence of integer x ends up into an encoding of x × g 2 ei . It
one Hadamard and controlled phase gates for each qubit. uses two controlled product-additions, i.e. the opera-
In Shor’s algorithm as well as in Ekerå and Håstad’s ver- tion letting |yi |zi unchanged if |ei i = |0i and map-
i
sion of Shor’s algorithm, the qubits are measured right ping |yi |zi into |yi |z + y × γi ((y, z, γ) → (x, 0, g 2 ) for
after the Fourier transform, hence explaining the mea- the first product-addition appearing in Figure 4 and
i
surements of each qubit at the end of gate sequences in (y, z, γ) → (x̄, x, −g −2 ) for the second one, where the
Figure 3a. The simple rearrangement presented in Fig- negative power stands for multiplicative inverse modulo
ure 3b shows that the measurement can be performed N ) when |ei i = |1i. In case |ei i = |1i, the mapping
6
H • ··· • ···
φ2 · · · H • ··· •
··· φ2 · · · H •
· · · φne · · · φne −1 φ2 H
H • • ··· • ···
φ2 H • ··· • ···
φ3 φ2 H ··· ··· •
··· φ ne φne −1 · · · φ2 H
(c) Semi-classical Fourier transform. The successive classically controlled rotation gates can
be merged.
H • • ··· • ··· +3
φ2 H • ··· • ··· +3
φ3 φ2 H ··· ··· • +3
··· φ ne φne −1 · · · φ2 H
Figure 3. Different versions of the Fourier transform followed by measurements. They are used to convince the reader that the
number of gates in the Fourier transform is negligible with respect to the cost
of the exponentiation. These three versions are
2πi
based on the phase gates φk defined as a 2 × 2 matrix with diagonal elements 1, e 2k and zeros off diagonal. Note that the
control and target qubits can be reversed in the representation of each controlled phase gate without changing the result.
|ei i • |e
ii i E
2i 2 ei them controlled both by the values of bits |yj i and |ei i.
|xi / ×g Figure 5 shows explicitly the decomposition of the first
xg
= product-addition appearing into each multiplication of
|ei i • • • |e
ii i E
i 2 ei the exponentiation.
|xi / Input x +x̄(−g −2 ) × xg
i
|0i / +xg 2 Input x̄ × |0i
|ei i • |ei i • ··· • ···
Figure 4. Principle of a modular multiplication circuit trans- |x0 i • ··· ···
..
forming a quantum register encoding the integer x into a state
··· ···
i |xi Input x = .
encoding x × g 2 ei . A first product-addition |xj i ··· • ···
E operation trans-
..
··· ···
i
forms auxiliary qubits in |0i into x × g 2 if |ei i = |1i. Then, .
i i
i i / +xg 2 / +20 g 2 · · · +2j g 2i · · ·
a product-addition applies +x̄(−g −2 ) with x̄ = x × g 2 into
the register encoding x if |ei i = |1i. A final swapping is ap-
Figure 5. Decomposition of the first product-addition appear-
i
E
plied if |ei i = |1i to put the quantum register into x × g 2 ei
ing in each element of the decomposition of the exponentiation
and resets the auxiliary qubits to |0i. Note that all the oper- into multiplications, see Figure 4.
ations are performed modulo N .
is performed by considering the binary representation We deduce that the modular exponentiation requires
yn−1 . . . y0 of y and by rewriting the product as ne multiplications, each being decomposed into 2n con-
n−1 n−1 trolled additions and 1 controlled swap between two reg-
isters, giving to a total number of 2ne n (ne ) controlled
X X
γ2j yj =
j
y×γ = γ2 yj . (B2)
j=0 j=0 additions (swaps between registers respectively). Each
addition needs to be modular, which can be obtained
As yj is either 0 or 1, the controlled product-addition with a specific representation and a standard adder cir-
can be implemented by a sequence of additions, each of cuit.
7
For the controlled addition, note first that since we Modular addition is typically implemented with vari-
use the coset representation of integers, a circuit for con- ants of the addition: an addition, a comparison,
trolled addition modulo a power of two is sufficient to im- a controlled correction and the clean-up of ancillary
plement a controlled modular addition. Such an addition qubits [39]. As exposed in main text, coset representa-
can be implemented with the semi-classical adder pre- tion of integers, introduced by Zalka [40] and formalized
sented in Figure 6a, which is inspired by Refs. [3, 37, 38]. by Gidney [36], can be used to approximate the modular
It shows the basic circuit taking a classical value 2j γ and addition with a single standard adder circuit.
a register encoding z 0 and returning 2j γ and z 0 +2j γ if the
two controlled qubits |ei i and |xj i are both in state |1i. The basic idea of the coset representation for adding
When such an addition is applied on a quantum register 2j γ modulo N to a quantum register encoding the in-
encoding z 0 using n + m qubits, the block in the dashed teger z is to extend the register for z with m additional
box of Figure 6a is repeated n + m − 2 times, giving a 2m −1
qubits and to encode it into the state √21m
P
|z + kN i.
mean cost of 5.5(n+m)−9 CNOTs and 2(n+m)−1 Tof- k=0
foli gates. Except at the bounds, this state is invariant under the
8
2m
P−1
Figure 7. Preparation proposed in [36, Fig. 1] of a quantum register with n + m qubits in the state √1 |z + kN i as
2m
k=0
requested in the initialization of the coset representation. The first controlled operation adds the integer N to the register
made with n+m qubits provided that the ancillary qubit is in state |1i. The first classically controlled operation aims to change
the phase of the input state encoded in n + m qubits if and only if the result of the measurement is 1 and the number encoded
in the n + m qubits is larger or equal than N . In case one of the two conditions is not met, the input state is unchanged.
(a) Multiplication for the exponentiation, with i = 2 and we = 2, decomposed into product-additions.
|ei Input ei:i+we Input ei:i+we Input ei:i+we
=
i
+x̄ −g −2 ei:i+we
i
|xi / ×g 2 ei:i+we
mod N |xi / Input x mod N
i
|0i / +xg 2 ei:i+we
mod N Input x̄ h0|
Input x0:3
Input x
=
Input x3:6
i i i
/ +xg 2 ei:i+we
mod N / +20 x0:3 g 2 ei:i+we
mod N +23 x3:6 g 2 ei:i+we
mod N
(c) Modular addition of a number read with a table lookup, as needed in (b).
we + wm we + wm
/ Input k / Input k Input k
n |Tk i
= |0i / Load Tk Input Tk Unload Tk |0i
Figure 9. Windowed arithmetic subcircuits for the modular exponentiation. When not specified, the register size is n+m qubits
(register encoded into the coset representation of integers).
As for the standard algorithm, the multiplications Loading a value Tk into a quantum register is done
of the product (D4) are implemented successively and using a quantum table lookup circuit which we discuss
each multiplication is decomposed into a sequence of two right after. The subsequent subsection is dedicated to
product-additions, as shown in Figure 9a. The difference the task aiming to unload the value Tk and reset the
is that the added number now depends on the number register in state |0i. The last subsection is dedicated to
ei:i+we . the requested addition.
Figure 9c finally shows the implementation of an addi- The quantum table lookup proposed in [41], pro-
tion +Tk mod N of a quantity Tk that depends on the duces the following operation on basis states: |ki |xi 7→
value k. It requires three steps. First, the number Tk is |ki |x ⊕ Tk i with ⊕ the bitwise XOR operator. For state
loaded into an ancillary register. Second, this number is preparation, as required in the first step of the opera-
unconditionally added to the desired register and finally tion presented in Figure 9c, the target register starts in
the ancillary register is cleaned up. Note that the value
i
the state |0i such that control and target registers end
of Tk (given by Tk1 ,k2 = 2i k1 g 2 k2 , with k1 = ei:i+we and in |ki |Tk i. The circuit presented in Figure 10 shows the
k2 = xi:i+wm , k being the concatenation of k1 and k2 ) to principle of this operation with registers for k and Tk
be added is known classically. Its addition being realized composed respectively of 3 and 5 qubits.
modulo N , its value can be computed modulo N before
being loaded. n bits are thus sufficient to encode Tk . Concretely, the numbers Tk specify the set of controlled
11
NOT to be used (the question mark on the controlled Let us now focus on a specific qubit indexed by j ∗ . We
NOT means that a controlled NOT is applied on qubit label K0 = {k | (Tk )j ∗ = 0} and K1 = {k | (Tk )j ∗ = 1}.
i only when the ith bit of Tk takes the value 1). The The state before the uncomputation can be rewritten as
circuit operating on the bits ki of k prepares the last an-
cillary qubit (line 5 from the top) in the state |1i at the X O E
time (specified by k) where the gates corresponding to
αk |ki (Tk )j |0ij ∗
j
Tk are applied, and |0i otherwise. The building block k∈K0 j6=j ∗
of the circuit is boxed in Figure 10. It uses 1 CNOT, 1 X O E
AND computation and uncomputation. Given that k is + αk |ki (Tk )j |1ij ∗ . (D6)
j
encoded into the number of bits we + wn and can thus k∈K1 j6=j ∗
take 2we +wn different values, the number of blocks in
Pm −1 j
we +w By applying a Hadamard gate on the j ∗ th qubit, we ob-
the upper part of Figure 10 is given by 2 = tain
j=1
O E
2we −wn − 2. This means that 2we +wm − 2 CNOT gates,
X
αk |ki (Tk )j
2we +wm − 2 AND computations and uncomputations are 1
k∈K0 j6=j ∗
j
√ E |0ij ∗
needed to implement these blocks. Moreover, the num- 2 +
X O
ber of controlled multi-NOT gates to load the value Tk α |ki k (T ) k j
j
k∈K1 j6=j ∗
is given by 2we +wm , each gate being decomposed into X O E
n/2 CNOT in average since Tk takes n bits. When αk |ki (Tk )j
including the (two) NOT gates operating on the high- j
1
k∈K0 j6=j ∗
+√ |1ij ∗ . (D7)
est bit of k, we conclude that the table lookup uses 2 −
X O E
2 NOT gates, 2we +wm − 2 + 2we +wm −1 n CNOT gates, αk |ki (Tk )j
j
k∈K1 j6=j ∗
2we +wm −2 AND computations and uncomputations (cor-
responding to 2 × (2we +wm − 2) Toffoli gates). Hence, if the measurement of the j ∗ th qubit yields 0, the
qubit is properly uncomputed. If the result is 1, a phase
shift needs to be applied on states corresponding to the
indexes k ∈ K1 .
3. Table unlookup
This uncomputation is successively applied to all the
qubits encoding the numbers Tk . Let tj be the mea-
The purpose of the table unlookupPoperation (last surement result of the jth qubit. The state after all the
step in Figure 9c) is to map the state αk |ki |Tk i into measurements is given by
k X
P
αk |ki, where αk are some complex coefficients. A nat- αk σk |ki , (D8)
k k
12
Q tj (Tk )j
with σk = (−1) . We now label k:s in an ancillary register in the unary representation: a
j register with 2s qubits representing a number k:s with the
state of the qubit number k:s being |1i and all the other
K = {k | σk = −1}. (D9)
qubits in the state |0i. The qubits in state |ks: i and the
In order to recover the desired state, we need to correct ancillary qubits are then used as control and target qubits
selectively the phase of terms |ki for which k ∈ K. for a lookup circuit where the controlled multi-NOT gates
are replaced by controlled multi-Z gates. Finally, the
s
|k:s i / Input k:s Input k:s ancillary register is uncomputed. The circuit used to
we + wm − s initialize the ancillary register in shown in Figure 12a.
|ks: i / Input ks: The one used to put it back in its initial state is given
2s in Figure 12b.
Init unary / H ⊕Fks: H Deinit unary
Figure 11. Representation of the four steps proposed in Starting from x encoded in s qubits, the conversion to
Ref. [20] to selectively change the phase of components |ki the unary representation takes 1 NOT gate, 2s −1 CNOT
in the state given in (D8) when the index k belongs to gates and 2s − 1 AND computation. The conversion
K (D9). The central operation is a table lookup with the back to the binary representation takes 1 NOT gate,
s
values Fks: =
2P −1
2j δ(j + 2s ks: ) where δ() is the indicator
2s − 1 CNOT gates and 2s − 1 AND uncomputation.
j=0 Given that k is encoded in we + wm bits, and that a
function of K. choice s = we +w
2
m
is judicious to minimize the num-
ber of gates, the change of phase of components |ki
takes 2b 2 c+1 + 4 single qubit gates, 2we +wm −1 +
we +wm
(a) Binary to unary (b) Unary to binary
2b w +w c+1 + 2d we +w e − 4 CNOTs and 2b we +w c+
we +wm m m
conversion conversion 2 2 2
•
• 2d e
2
m
e − 3 ANDs (1 NOT gate, 2 b we +wm
2 c − 1 CNOT
|xi • • |xi • •
gates and 2b 2 c − 1 AND computation for the
we +wm
• • • • • • • •
unary conversion, 2 × 2b 2 c Hadamard gates around
we +wm
|0i • • • • • • |0i
the table lookup, 2 NOT gates, 2d 2 e − 2 +
we +wm
• • • • • •
2we +wm −1 CNOT gates, 2d 2 e − 2 AND computa-
we +wm
• • • •
• • • • tions and uncomputations for the lookup circuit and
1 NOT gate, 2b 2 c − 1 CNOT gates and 2b 2 c −
we +wm we +wm
• •
• •
• • 1 AND uncomputation for the binary conversion). In-
• • cluding the additional n Hadamard gates and n mea-
Figure 12. (a): representation of the circuit proposed in surements on Tk , we conclude that the table unlookup
takes 2b 2 c+1 + n + 4 single qubit gates, 2we +wm −1 +
we +wm
Ref. [20] for preparing a copy in a ancillary register of an inte-
ger x in a unary representation starting from an encoding of
2b w +w c+1 + 2d we +w e − 4 CNOTs and 2b we +w c+
we +wm m m
2 2 2
x in a control register in the binary representation. The first
2d e
2
m
e − 3 ANDs.
not operation prepares the first qubit in the ancillary regis-
ter in state |1i. The first AND computation writes the result
of an AND operation between the first bit of x and the bit
1 encoded in the first qubit of the ancillary register into the
second qubit of the ancillary register. In case the state of the 4. Standard adder
latter is |1i, the state of the first qubit of the ancillary regis-
ter is changed to |0i. The combination of AND and CNOT
operations is successively repeated until the desired qubit of
As we use the coset representation of integers with win-
the ancillary register is in state |1i. (b): representation of the
circuit proposed in Ref. [20] to erase the value in the ancillary dowed arithmetic operations, a circuit for unconditional
register while keeping the integer x into the control register. addition modulo a power of two is sufficient to imple-
The circuits (a) and (b) corresponds to the first and third op- ment a modular addition. The adder we use, which is
erations needed for the selective phase correction operation described in [38] and optimized from [37] for use with
presented in Figure 11. T gates, is presented in Figure 13. It is thrifty in gate
number and ancillary qubits, at the cost of being deeper
The selective phase correction is done in four steps [20], than other circuits [43, 44], which is not a disadvantage
as shown in Figure 11. First, the control register which for our architecture.
uses we + wm qubits in state |ki is split in two groups.
The first group is made with s qubits in state |k:s i. The As presented in Figure 9c, the adder needs to add
second group takes the remaining we + wm − s qubits in a number Tk taking n qubits into a register with n +
state |ks: i, such that |ki = |ks: i ⊗ |k:s i and k = k:s + m qubits. To achieve this, either the first register for Tk
2s ks: . The second step consists in writing the integer is extended with qubits in the |0i state, either we use
13
x0 • • • x0
y0 • • (y + x)0 that the swap operation is realized by simply relabel-
• • • • ing the register, hence is for free. According to the
x1 • • • x1 counts obtained from previous subsections, the cost of
y1 • • (y + x)1 the exponentiation is dominated — in the limit n → ∞,
ne = O(n), we and wm constant — by: 2 new (n+m)n
e wm
one
• • • • qubit gates, (2we +wm n + 12(n + m)) new(n+m) CNOTs,
x2 x2 e wm
• • • 2
The cost of the addition circuit (Figure 13) is 6(n + Appendix E: Error correction
m) − 9 CNOT gates and n + m − 1 AND computations
and uncomputations.
This appendix is dedicated to 3D gauge color codes.
The first subsection is dedicated to the principle of sub-
system codes. The second subsection describes the ge-
5. Cost estimation
ometrical structure of 3D gauge color codes. The last
subsection provides a detailed description of the cut of
the code structure that is used to process and correct the
In summary, the parameters of the logical circuit for logical qubits.
computing the modular exponentiation are
of the logical qubits space A and the gauge qubits space The measured operators are the gauge generators for
B [47], that is, the Hilbert space is decomposed as the code used to implement the H and CNOT gates
— the (1, 1) code (see [22]). These generators are de-
H = (A ⊗ B) ⊕C ⊥ . scribed by the edges: each operator is the product of X
| {z }
C or Z operators of the elementary tetrahedrons adjacent
to a given edge (each operator implies up to 6 physical
The gauge group acts trivially on the logical qubits and qubits).
is the Pauli group of the gauge qubits while the logical
group acts trivially on the gauge qubits and is the Pauli The stabilizer generators of the (1, 1) and (1, 2) codes
group of the logical qubits (up to phases). This ensures (the (1, 2) code refers to the code used to implement the T
that gauge operator measurements don’t modify the log- gate [22]) which are described by the vertices and edges,
ical qubits. are deduced from the values of measured operators. More
precisely, the operator corresponding to a vertex can be
A gauge fixing operation consists in switching from a written as the product of the operators corresponding to
code to another one such that the new stabilizer group edges starting at the given vertex and ending on vertices
includes the original one while being included into the of a common color. Three choices of color are possible,
original gauge group, while keeping unmodified the logi- allowing one to recover in three different ways an opera-
cal group. The decomposition associated to the original tor corresponding to a vertex. This redundancy can be
code used for achieving fault-tolerant error correction in only
one measurement of the (gauge) operators related to the
H = (A ⊗ B) ⊕C ⊥
| {z } edges [22].
C
then becomes of the form Let ncode be the index of the code which is the number
of vertices of the same color on one edge of the large
H = (A ⊗ B 0 ) ⊕ (A ⊗ B 00 ) ⊕ C ⊥ tetrahedron (denoted as n in [22]). The code distance
| {z } is given by d = 2ncode + 1, and the number of physical
C 0⊥ 3
qubits is 1 + 4ncode + 6n2code + 4n3code = d 2+d [22, 48].
where B 0 is the new gauge qubit space. As a consequence,
a valid code-word for this new code is also valid for the
initial one. The passage of the latter to the new code
is done by measuring the generators of the gauge group, 3. Slicing of the code structure
the results of these measurements giving the correction
to apply on B 0 ⊕ B 00 to remove the components on B 00 .
To process the information, the code structure is de-
For 3D gauge color codes, code switching allows a composed into slices, each slice being map successively
transversal error-corrected implementation of a univer- into the 2D processor. While several cuts in slices are
sal set of gates [22]. possible, we choose slices orthogonal to two faces (see
Figure 14). The processor need to be sized to fit in the
larger slice, that join the edge not included into any of
the two faces to the middle of the opposing edge — the
2. Code geometrical structure magenta slice in Figure 14.
n ne m we wm d nqubits texp logical qubits total modes spatial modes temporal modes all memory correction
6 6 4 3 2 7 316 1 min 38 6 650 3 002 5 95 µs
829 1 242 26 3 3 41 10 244 11 days 3 396 117 097 476 8 697 156 39 66 ms
2 048 3 029 30 3 3 47 13 436 177 days 8 284 430 229 540 27 825 956 45 186 ms
Table I. For different integer sizes n and corresponding exponent size ne (∼ 1.5n), the table presents the optimal set of
parameters, processor size and computation runtime, and the memory requirements.
– Some operations in the adder can be paral- in a very high absorption coefficient α = 4.0 m−1 . If
lelized (see Figure 13). The controlled NOT we assume a L = λ/2 cavity, unit absorption and re-
operations aligned vertically can be applied at emission efficiencies are obtained if the quality factor is
the same time. Q = F = 2π/(αλ) ≈ 26 for a 5 GHz cavity. In this
– The runtime is dominated by the time spent low-Q regime, κ Γ and a coherence time of a few hun-
to implement the CNOT gates of the quan- dreds of microseconds would translate into a multi-mode
tum lookup circuit (see Figure 10) and they capacity of a few tens of modes. By working with crys-
are easily parallelizable. A full parallelization, tals having lower doping concentrations, the coherence
would reduce the factorization of 2 048 RSA time can be significantly increased [50], while still reach-
integers to about 27 days, at the cost of using ing the impedance matching point with low-Q resonators.
about 12 millions qubits in the processor. In this case, a few thousand modes might realistically be
stored very efficiently. Rare-earth doped materials is not
– Oblivious carry runways allows parallelization
the only option and other candidates such as negatively
of the adders [36].
charged nitrogen vacancy color centers in diamond [51] or
– Other type of adders could exploit fur- bismuth donors in silicon [52] may be even more promis-
ther parallelizations, for instance lookahead ing.
adders [44]
– During a product-addition operation, the dif-
ferent additions can be parallelized by com-
puting separately partial sums. ∗
[email protected]
– During the exponentiation, the different mul- †
https://quantum.paris
tiplications can be parallelized by computing [1] M. Kjaergaard, M. E. Schwartz, J. Braumüller,
separately partial products. P. Krantz, J. I.-J. Wang, S. Gustavsson, and W. D.
Oliver, Superconducting qubits: Current state of play,
• The qubit number can be reduced using another Annual Review of Condensed Matter Physics 11, 369
slicing of the code structure, at the cost of a longer (2020), 1905.13641.
computation time. For example, if one chooses to [2] C. Gidney and M. Ekerå, How to factor 2048 bit rsa inte-
cut the tetrahedron by slices parallel to a facet of gers in 8 hours using 20 million noisy qubits, 1905.09749
this tetrahedron, we estimate that a 2 048-bits RSA (2019).
[3] Y. R. Sanders, D. W. Berry, P. C. S. Costa, N. Wiebe,
integer could be factorized with 6 628 qubits in the C. Gidney, H. Neven, and R. Babbush, Compilation of
processor and 354 days. fault-tolerant quantum heuristics for combinatorial opti-
mization, 2007.07391 (2020).
[4] J. Lee, D. Berry, C. Gidney, W. J. Huggins, J. R. Mc-
Clean, N. Wiebe, and R. Babbush, Even more efficient
Appendix G: Realization combining a Rare-earth quantum computations of chemistry through tensor hy-
doped solid and a superconducting resonator percontraction, 2011.03494 (2020).
[5] D. Kielpinski, C. Monroe, and D. J. Wineland, Architec-
ture for a large-scale ion-trap quantum computer, Nature
For implementing a multimode memory with a spin- 417, 709 (2002).
echo technique, materials doped with rare-earth, such [6] D. D. Thaker, T. S. Metodi, A. W. Cross, I. L. Chuang,
and F. T. Chong, Quantum memory hierarchies: Effi-
as Erbium Er3+ provide an appealing example since
cient designs to match available parallelism in quantum
these ions have doubly-degenerate Zeeman states which computing, in 33rd International Symposium on Com-
split when an external magnetic field is applied. Sev- puter Architecture (ISCA’06) (IEEE, 2006) pp. 378–390,
eral manuscripts have reported on the successful cou- quant-ph/0604070.
pling between the crystal Er3+:Y2SiO5 and a supercon- [7] Z.-L. Xiang, S. Ashhab, J.-Q. You, and F. Nori, Hybrid
ducting microwave resonator [32–34]. Ref. [34] in par- quantum circuits: Superconducting circuits interacting
ticular reported on √
the strong coupling with a collec- with other quantum systems, Reviews of Modern Physics
85, 623 (2013), 1204.2137.
tive coupling rate g N̄ = 2π × 34 MHz and an inho-
mogeneous linewidth Γ = 2π × 12 MHz. This results
17
[8] G. Kurizki, P. Bertet, Y. Kubo, K. Mølmer, D. Pet- Matter , Ph.D. thesis (2018).
rosyan, P. Rabl, and J. Schmiedmayer, Quantum tech- [27] A. Kubica, M. E. Beverland, F. Brandão, J. Preskill, and
nologies with hybrid systems, Proceedings of the National K. M. Svore, Three-dimensional color code thresholds via
Academy of Sciences 112, 3866 (2015), 1504.00158. statistical-mechanical mapping, Physical Review Letters
[9] C. Grezes, Y. Kubo, B. Julsgaard, T. Umeda, J. Isoya, 120, 180501 (2018), 1708.07131.
H. Sumiya, H. Abe, S. Onoda, T. Ohshima, K. Naka- [28] S. J. Devitt, W. J. Munro, and K. Nemoto, Quantum
mura, I. Diniz, A. Auffeves, V. Jacques, J.-F. Roch, error correction for beginners, Reports on Progress in
D. Vion, D. Esteve, K. Mølmer, and P. Bertet, Towards Physics 76, 076001 (2013), 0905.2794.
a spin-ensemble quantum memory for superconduct- [29] M. Afzelius, N. Sangouard, G. Johansson, M. U. Staudt,
ing qubits, Comptes Rendus Physique 17, 693 (2016), and C. M. Wilson, Proposal for a coherent quantum
1510.06565. memory for propagating microwave photons, New Jour-
[10] H. Bombı́n and M. A. Martin-Delgado, Exact topologi- nal of Physics 15, 065008 (2013), 1301.1858.
cal quantum order in d = 3 and beyond: Branyons and [30] M. Afzelius and C. Simon, Impedance-matched cavity
brane-net condensates, Physical Review B 75, 075103 quantum memory, Physical Review A 82, 022310 (2010),
(2007), cond-mat/0607736. 1004.2469.
[11] P. W. Shor, Algorithms for quantum computation: dis- [31] T. Chanelière, G. Hétet, and N. Sangouard, Quantum op-
crete logarithms and factoring, in Proceedings 35th An- tical memory protocols in atomic ensembles, in Advances
nual Symposium on Foundations of Computer Science In Atomic, Molecular, and Optical Physics, Vol. 67 (El-
(IEEE Comput. Soc. Press, 1994) pp. 124–134. sevier, 2018) Chap. 2, pp. 77–150, 1801.10023.
[12] M. Ekerå and J. Håstad, Quantum algorithms for com- [32] P. A. Bushev, A. K. Feofanov, H. Rotzinger, I. Pro-
puting short discrete logarithms and factoring RSA in- topopov, J. H. Cole, C. M. Wilson, G. Fischer, A. V.
tegers, in Post-Quantum Cryptography, Lecture Notes in Lukashenko, and A. V. Ustinov, Ultralow-power spec-
Computer Science, Vol. 10346, edited by T. Lange and troscopy of a rare-earth spin ensemble using a supercon-
T. Takagi (Springer International Publishing, 2017) pp. ducting resonator, Physical Review B 84, 060501 (2011),
347–363, 1702.00249. 1102.3841.
[13] R. L. Rivest, A. Shamir, and L. Adleman, A method [33] M. U. Staudt, I.-C. Hoi, P. Krantz, M. Sandberg,
for obtaining digital signatures and public-key cryptosys- M. Simoen, P. A. Bushev, N. Sangouard, M. Afzelius,
tems, Communications of the ACM 21, 120 (1978). V. S. Shumeiko, G. Johansson, P. Delsing, and C. M. Wil-
[14] W. Diffie and M. E. Hellman, New directions in cryp- son, Coupling of an erbium spin ensemble to a supercon-
tography, IEEE Transactions on Information Theory 22, ducting resonator, Journal of Physics B: Atomic, Molec-
644 (1976). ular and Optical Physics 45, 124019 (2012), 1201.1718.
[15] Information Technology Laboratory, Digital signature [34] S. Probst, H. Rotzinger, S. Wünsch, P. Jung, M. Jerger,
standard (DSS) (2013). M. Siegel, A. V. Ustinov, and P. A. Bushev, Anisotropic
[16] P. W. Shor, Polynomial-time algorithms for prime factor- rare-earth spin ensemble strongly coupled to a supercon-
ization and discrete logarithms on a quantum computer, ducting resonator, Physical Review Letters 110, 157001
SIAM Journal on Computing 26, 1484 (1997), quant- (2013), 1212.2856.
ph/9508027. [35] R. B. Griffiths and C.-S. Niu, Semiclassical fourier trans-
[17] M. Ekerå, Modifying shor’s algorithm to compute short form for quantum computation, Physical Review Letters
discrete logarithms (2016), https://eprint.iacr.org/ 76, 3228 (1996), quant-ph/9511007.
2016/1128. [36] C. Gidney, Approximate encoded permutations and
[18] M. Ekerå, On post-processing in the quantum algorithm piecewise quantum adders, 1905.08488 (2019).
for computing short discrete logarithms (2017), https: [37] S. A. Cuccaro, T. G. Draper, S. A. Kutin, and D. P.
//eprint.iacr.org/2017/1122. Moulton, A new quantum ripple-carry addition circuit,
[19] M. Ekerå, Quantum algorithms for computing general quant-ph/0410184 (2004).
discrete logarithms and orders with tradeoffs (2018), [38] C. Gidney, Halving the cost of quantum addition, Quan-
https://eprint.iacr.org/2018/797. tum 2, 74 (2018), 1709.06648.
[20] C. Gidney, Windowed quantum arithmetic, 1905.07682 [39] V. Vedral, A. Barenco, and A. Ekert, Quantum networks
(2019). for elementary arithmetic operations, Physical Review A
[21] The code is available at https://github.com/ 54, 147 (1996), quant-ph/9511018.
ElieGouzien/factoring_with_memory. [40] C. Zalka, Shor’s algorithm with fewer (pure) qubits,
[22] H. Bombı́n, Gauge color codes: optimal transversal gates quant-ph/0601097 (2006).
and gauge fixing in topological stabilizer codes, New [41] R. Babbush, C. Gidney, D. W. Berry, N. Wiebe, J. Mc-
Journal of Physics 17, 083002 (2015), 1311.0879. Clean, A. Paler, A. Fowler, and H. Neven, Encoding elec-
[23] E. T. Campbell, B. M. Terhal, and C. Vuillot, Roads tronic spectra in quantum circuits with linear t complex-
towards fault-tolerant universal quantum computation, ity, Physical Review X 8, 041015 (2018), 1805.03662.
Nature 549, 172 (2017), 1612.07330. [42] D. W. Berry, C. Gidney, M. Motta, J. R. McClean, and
[24] H. Bombı́n, Single-shot fault-tolerant quantum error cor- R. Babbush, Qubitization of arbitrary basis quantum
rection, Physical Review X 5, 031043 (2015), 1404.5504. chemistry leveraging sparsity and low rank factorization,
[25] B. J. Brown, N. H. Nickerson, and D. E. Browne, Fault- Quantum 3, 208 (2019), 1902.02134.
tolerant error correction with the gauge color code, Na- [43] T. G. Draper, Addition on a quantum computer, quant-
ture Communications 7, 12302 (2016), 1503.08217. ph/0008033 (2000).
[26] A. M. Kubica, The ABCs of the Color Code: A Study [44] T. G. Draper, S. A. Kutin, E. M. Rains, and K. M. Svore,
of Topological Quantum Codes as Toy Models for Fault- A logarithmic-depth quantum carry-lookahead adder,
Tolerant Quantum Computation and Quantum Phases Of Quantum Information and Computation 4-5, 351 (2006),
18
quant-ph/0406142. (2020).
[45] Due to the measurement-based uncomputation, it is often [50] M. Le Dantec, M. Rancic, E. Flurin, D. Vion, D. Es-
more efficient to implement the non-Clifford operations teve, P. Bertet, P. Goldner, T. Chanelière, B. Sylvain,
through AND computation. In case of direct implemen- S. Lin, and R. B. Liu, Twenty millisecond electron-spin
tation of Toffoli gates, the circuit cost could be slightly coherence in an erbium doped crystal, in Bulletin of the
reduced. American Physical Society (American Physical Society,
[46] D. Poulin, Stabilizer formalism for operator quantum 2021).
error correction, Physical Review Letters 95, 230504 [51] Y. Kubo, C. Grezes, A. Dewes, T. Umeda, J. Isoya,
(2005), quant-ph/0508131. H. Sumiya, N. Morishita, H. Abe, S. Onoda, T. Ohshima,
[47] P. Zanardi, D. A. Lidar, and S. Lloyd, Quantum tensor V. Jacques, A. Dréau, J.-F. Roch, I. Diniz, A. Auffeves,
product structures are observable induced, Physical Re- D. Vion, D. Esteve, and P. Bertet, Hybrid quantum
view Letters 92, 060402 (2004), quant-ph/0308043. circuit with a superconducting qubit coupled to a spin
[48] M. E. Beverland, A. Kubica, and K. M. Svore, The cost ensemble, Physical Review Letters 107, 220501 (2011),
of universality: A comparative study of the overhead of 1110.2978.
state distillation and code switching with color codes, [52] V. Ranjan, J. O’Sullivan, E. Albertinale, B. Albanese,
2101.02211 (2021). T. Chanelière, T. Schenkel, D. Vion, D. Esteve, E. Flurin,
[49] F. Boudot, P. Gaudry, A. Guillevic, N. Heninger, J. J. L. Morton, and P. Bertet, Multimode storage of
E. Thomé, and P. Zimmermann, Factorization of rsa-250 quantum microwave fields in electron spins over 100 ms,
Physical Review Letters 125, 210505 (2020), 2005.09275.