Quantum Approximate Optimization Algorithm: Performance, Mechanism, and Implementation On Near-Term Devices

Quantum Approximate Optimization Algorithm: Performance, Mechanism, and
Implementation on Near-Term Devices

Leo Zhou,1, ∗ Sheng-Tao Wang,1, † Soonwon Choi,1, 2 Hannes Pichler,3, 1 and Mikhail D. Lukin1
1
Department of Physics, Harvard University, Cambridge, MA 02138, USA
2
Department of Physics, University of California Berkeley, Berkeley, CA 94720, USA
3
ITAMP, Harvard-Smithsonian Center for Astrophysics, Cambridge, MA 02138, USA
(Dated: November 12, 2019)
The Quantum Approximate Optimization Algorithm (QAOA) is a hybrid quantum-classical vari-
ational algorithm designed to tackle combinatorial optimization problems. Despite its promise for
near-term quantum applications, not much is currently understood about QAOA’s performance
beyond its lowest-depth variant. An essential but missing ingredient for understanding and de-
ploying QAOA is a constructive approach to carry out the outer-loop classical optimization. We
arXiv:1812.01041v2 [quant-ph] 9 Nov 2019
provide an in-depth study of the performance of QAOA on MaxCut problems by developing an effi-
cient parameter-optimization procedure and revealing its ability to exploit non-adiabatic operations.
Building on observed patterns in optimal parameters, we propose heuristic strategies for initializ-
ing optimizations to find quasi-optimal p-level QAOA parameters in O(poly(p)) time, whereas the
standard strategy of random initialization requires 2O(p) optimization runs to achieve similar perfor-
mance. We then benchmark QAOA and compare it with quantum annealing, especially on difficult
instances where adiabatic quantum annealing fails due to small spectral gaps. The comparison re-
veals that QAOA can learn via optimization to utilize non-adiabatic mechanisms to circumvent the
challenges associated with vanishing spectral gaps. Finally, we provide a realistic resource analysis
on the experimental implementation of QAOA. When quantum fluctuations in measurements are
accounted for, we illustrate that optimization will be important only for problem sizes beyond nu-
merical simulations, but accessible on near-term devices. We propose a feasible implementation of
large MaxCut problems with a few hundred vertices in a system of 2D neutral atoms, reaching the
regime to challenge the best classical algorithms.
I. INTRODUCTION by classical computers [10]. It is thus an appealing algo-

rithm to explore quantum speedups on near-term quan-
As quantum computing technology develops, there is tum machines.
a growing interest in finding useful applications of near- However, very little is known about QAOA beyond
term quantum machines [1]. In the near future, however, the lowest depth. While QAOA is known to monoton-
the number of reliable quantum operations will be lim- ically improve with depth and succeed in the p → ∞
ited by noise and decoherence. As such, hybrid quantum- limit [2], its performance when 1 < p < ∞ is largely un-
classical algorithms [2–4] have been proposed to make the explored. In fact, it has been argued that one needs to go
best of available quantum resources and integrate them beyond low-depth QAOA in order to compete with the
with classical routines. The Quantum Approximate Op- best classical algorithm for some problems on bounded-
timization Algorithm (QAOA) [2] and the Variational degree graphs [11, 12]. It thus remains a critical problem
Quantum Eigensolver [3] are such algorithms put forward to assess QAOA at intermediate depths where one may
to address classical combinatorial optimization and quan- hope for a quantum computational advantage. One ma-
tum chemistry problems, respectively. Proof-of-principle jor hurdle lies in the difficulty to efficiently optimize in
experiments running these algorithms have already been the non-convex, high-dimensional parameter landscape.
demonstrated in the lab [5–8]. Without constructive approaches to perform the param-
In these hybrid algorithms, a quantum processor pre- eter optimization, any potential advantages of the hybrid
pares a quantum state according to a set of variational algorithms could be lost [13].
parameters. Using measurement outputs, the parameters
In this work, we contribute, in three major aspects,
are then optimized by a classical computer and fed back
to the understanding and applicability of QAOA on
to the quantum machine in a closed loop. In QAOA,
near-term devices, with a focus on MaxCut problems.
the state is prepared by a p-level circuit specified by
First, we develop heuristic strategies to efficiently opti-
2p variational parameters. Even at the lowest circuit
mize QAOA variational parameters. These strategies are
depth (p = 1), QAOA has non-trivial provable perfor-
found, via extensive benchmarking, to be quasi-optimal
mance guarantees [2, 9] and is not efficiently simulatable
in the sense that they usually produce known global op-
tima. The standard approach with random initializa-
tion generically require 2O(p) optimization runs to sur-
∗ [email protected] pass our heuristics. Secondly, we benchmark the perfor-
† [email protected]; mance of QAOA and compare it with quantum anneal-
L.Z. and S.-T.W. contributed equally to this work. ing. On difficult graph instances where the minimum
2
spectral gap is very small, the time required for quantum variational parameters
annealing to remain adiabatic is very long as it scales in-
versely with the square of the gap. For these instances,
measure
QAOA is found to outperform adiabatic quantum an-
Optimization
nealing by multiple orders of magnitude in computation
time. Lastly, we provide a detailed resource analysis on
the experimental implementation of QAOA with near-
term quantum devices. Taking into account of quantum
fluctuations in projective measurements, we argue that
optimization will play a role only for much larger problem
sizes than numerically accessible ones. We also propose a
2D physical implementation of QAOA on MaxCut with 4
2
a few hundred Rydberg-interacting atoms, which can be
put to the test against the best classical algorithm for
potential quantum advantages. 1
Our main results can be summarized as follows. By 3
performing extensive searches in the entire parameter 5
space, we discover persistent patterns in the optimal pa-
rameters. Based on the observed patterns, we develop
strategies for selecting initial parameters in optimization, FIG. 1. (a) Schematic of a p-level Quantum Approximation
which allow us to efficiently optimize QAOA at a cost Optimization Algorithm [2]. A quantum circuit takes input
x
scaling polynomially in p. This is in stark contrast to the |+i⊗n and alternately applies e−iγi HC and Xβi = e−iβi σ ,
and the final state is measured to obtain expectation value
2O(p) optimization runs required by random initializa-
with respect to the objective function HC . This is fed to a
tion approaches. We also propose a new parametrization ~ that
(classical) optimizer to find the best parameters (~
γ , β)
of QAOA that may significantly simplify optimization maximizes hHC i. (b) An example of a MaxCut problem on
by reducing the dimension of the search space. Using a 5-vertex graph, where one seeks an assignment of spin vari-
our heuristic strategy, we benchmark the performance ables on the vertices such that the sum of edge weights be-
of QAOA on many instances of MaxCut up to N ≤ 22 tween anti-aligned spins is maximized (black edges).
vertices and level p ≤ 50. Comparing QAOA with quan-
tum annealing, we find the former can learn via opti-
mization to utilize diabatic mechanisms [14–17] and over- mechanism of QAOA. Lastly, we discuss considerations
come the challenges faced by adiabatic quantum anneal- for experimental implementations for large problem sizes
ing due to very small spectral gaps. Considering realistic in Sec. VI.
experimental implementations, we also study the effects
of quantum “projection noise” in measurement: we find
that, for numerically accessible problem sizes, QAOA can
II. QUANTUM APPROXIMATE
often obtain the solution among measurement outputs OPTIMIZATION ALGORITHM
before the best variational parameters are found. Pa-
rameter optimization will be more useful at large system
sizes (a few hundred vertices), as one expects the prob- Many interesting real-world problems can be framed as
ability of finding the solution from projective measure- combinatorial optimization problems [20, 21]. These are
ments to decrease exponentially. At such system sizes, problems defined on N -bit binary strings z = z1 · · · zN ,
we analyze a procedure to make graphs more experimen- where the goal is to determine a string that maximizes
tally realizable by reducing the required range of qubit a given classical objective function C(z) : {+1, −1}N 7→
interactions via vertex renumbering. Finally, we discuss R≥0 . An approximate optimization algorithm aims to
a specific implementation using neutral atoms interacting find a string z that achieves a desired approximation ratio
via Rydberg excitations [18, 19], where a 2D implemen- C(z)
tation with a few hundred atoms appears feasible on a ≥ r∗ , (1)
near-term device. Cmax
The rest of the paper is organized in the following or- where Cmax = maxz C(z).
der: In Sec. II, we review QAOA and the MaxCut prob- The Quantum Approximate Optimization Algorithm
lem. In Sec. III, we describe some patterns found for (QAOA) is a quantum algorithm recently introduced to
QAOA optimal parameters and introduce heuristic opti- tackle these combinatorial optimization problems [2]. To
mization strategies based on the observed patterns. We encode the problem, the classical objective function can
benchmark our heuristic strategies and study the per- be converted to a quantum problem Hamiltonian by pro-
formance of QAOA on typical MaxCut graph instances moting each binary variable zi to a quantum spin σiz :
in Sec. IV. We then, in Sec. V, compare QAOA with
quantum annealing, shedding light on the non-adiabatic HC = C(σ1z , σ2z , · · · , σN
z
). (2)
3
For p-level QAOA, which is visualized in Fig. 1(a), we ratio guarantee on generic graphs belongs to Goemans-
initialize the quantum processor in the state |+i⊗N , and Williamson [24], which achieves r∗ ≈ 0.87856 using semi-
then apply the problem Hamiltonian HC and a mixing definite programming. This lower bound can be raised
to r∗ ≈ 0.9326 when restricted to u3R graphs [25]. Farhi
PN
Hamiltonian HB = j=1 σjx alternately with controlled
durations to generate a variational wavefunction et al. [2] were able to prove that QAOA at level p = 1
achieves r∗ ≥ 0.6924 for u3R graphs, using the fact that
|ψp (~ ~ = e−iβp HB e−iγp HC · · · e−iβ1 HB e−iγ1 HC |+i⊗N ,
γ , β)i Fp can be written as a sum of quasi-local terms, each
(3) corresponding to a subgraph involving edges at most p
which is parameterized by 2p variational parameters γi steps away from a given edge. However, this approach
and βi (i = 1, 2, · · · p). We then determine the expecta- to bound r∗ quickly becomes intractable since the local-
tion value HC in this variational state ity of each term (i.e., size of each subgraph) grows ex-
ponentially in p, as does the number of subgraph types
Fp (~ ~ = hψp (~
γ , β) ~
γ , β)|HC |ψp (~
~
γ , β)i, (4) involved.
which is done by repeated measurements of the quantum QAOA is believed to be a promising algorithm for
system in the computational basis. A classical computer multiple reasons [2, 9, 10, 26–31]. As mentioned above,
is used to search for the optimal parameters (~ ~ ∗ ) so as
γ∗, β for certain cases one can prove a guaranteed minimum
to maximize the averaged measurement output Fp (~γ , β), ~ approximation ratio when p = 1 [2, 9]. Addition-
ally, under reasonable complexity-theoretic assumptions,
~ ∗ ) = arg max Fp (~
γ∗, β
(~ ~
γ , β). (5) QAOA cannot be efficiently simulated by any classi-
~ ~
γ ,β cal computer even when p = 1, making it a suitable
candidate algorithm to establish the so-called “quantum
This is typically done by starting with some initial guess supremacy” [10]. It has also been argued that the square-
of the parameters and performing simplex or gradient- pulse (“bang-bang”) ansatz of dynamical evolution, of
based optimization. A figure of merit for benchmarking which QAOA is an example, is optimal given a fixed
the performance of QAOA is the approximation ratio quantum computation time [28]. In general, the per-
formance of QAOA can only improve with increasing
~ ∗)
γ∗, β
Fp (~
r= . (6) p, achieving r → 1 when p → ∞ since it can approx-
Cmax imate adiabatic quantum annealing via Trotterization;
this monotonicity makes it more attractive than quan-
The framework of QAOA can be applied to general tum annealing whose performance may decrease with in-
combinatorial optimization problems. Here, we focus creased run time [14].
on its application to an archetypical problem called
MaxCut, which is a combinatorial problem whose ap- While QAOA has a simple description, not much is
proximate optimization beyond a minimum ratio r∗ is currently understood beyond p = 1. To establish poten-
NP-hard [22, 23]. The MaxCut problem, visualized in tial quantum advantage over classical algorithms, it is of
Fig. 1(b), is defined for any input graph G = (V, E). critical importance to investigate QAOA at intermediate
Here, V = {1, 2, . . . , N } denotes the set of vertices and depths (p > 1). Refs. [2, 11, 12] have shown that QAOA
E = {(hi, ji , wij )} is the set of edges, where wij ∈ R≥0 have limited performance on some problems on bounded-
is the weight of the edge hi, ji connecting vertices i and degree graphs when the depth is shallow. This limitation
j. The goal of MaxCut is to maximize the following ob- may result from the fact that the algorithm cannot “see”
jective function the entire graph at low depth. It thus indicates that one
X wij may need the depth of QAOA to grow with the system
HC = (1 − σiz σjz ), (7) size (e.g., p ≥ log N ) in order to outperform the best clas-
2
hi,ji sical algorithms. For the toy example of MaxCut on u2R
graphs, i.e. 1D antiferromagnetic rings, it is conjectured
where an edge hi, ji contributes with weight wij if and that QAOA yields r ≥ (2p + 1)/(2p + 2) based on numer-
only if spins σiz and σjz are anti-aligned. ical evidence [2, 26][32]. In another example of Grover’s
For simplicity, we restrict our attention to MaxCut on unstructured search problem among n items, QAOA
d-regular graphs, where every vertex is connected to ex- √ is
shown to be able to find the target state with p = Θ( n),
actly d other vertices. We study two classes of graphs: achieving the optimal query complexity within a constant
the first is unweighted d-regular graphs (udR), where all factor [30]. For more general problems, Farhi et al. [2]
edges have equal weights wij = 1; the second is weighted proposed a simple approach by discretizing each param-
d-regular graphs (wdR), where the weights wij are cho- eter into O(poly(N )) grid points; this, however, requires
sen uniformly at random from the interval [0, 1]. It is examining N O(p) possibilities at level p, which quickly
NP-hard to design an algorithm that guarantees a min- becomes impractical as p grows. Efficient optimization
imum approximation ratio of r∗ ≥ 16/17 for MaxCut of QAOA parameters and understanding the algorithm
on all graphs [22], or r∗ ≥ 331/332 when restricted to for 1 < p < ∞ remain outstanding problems. We address
u3R graphs [23]. The current record for approximation these problems in the present work.
4
0.4 0.2 udR graphs creates redundancy since e−iπHC = 1 if d is

p=7 p=7 even, and (σ z )⊗N if d is odd. These symmetries allow us
0.2 0.1
to restrict βi ∈ [− π4 , π4 ) in general, and γi ∈ [− π2 , π2 ) for
udR graphs.
0 0
1 2 3 4 5 6 7 1 2 3 4 5 6 7 We start by numerically investigating the optimal
QAOA parameters for MaxCut on random u3R and w3R
p=3 p=4 p=5 graphs with vertex number 8 ≤ N ≤ 22, with exten-
0.4 0.4 0.4 sive searches in the entire parameter space. For each
0.3 0.3 0.3 graph instance and a given level p, we choose a random
0.2 0.2 0.2 initial point (seed) in the parameter space [34] and use
0.1 0.1 0.1 a commonly used, gradient-based optimization routine
0 0 0 known as BFGS [33] to find a local optimum (~ γL, β ~ L ).
1 2 3 1 2 3 4 1 2 3 4 5
This local optimization is repeated with sufficiently many
0.2 0.2 0.2 different seeds to find the global optimum (~ γ∗, β~ ∗ ) [35].
0.15 0.15 0.15 We then reduce the degeneracies of the optimal param-
0.1 0.1 0.1 eters (~ ~ ∗ ) using the aforementioned symmetries. In
γ∗, β
0.05 0.05 0.05 all cases examined, we find that the global optimum is
p=3 p=4 p=5 non-degenerate up to these symmetries.
0 0 0
1 2 3 1 2 3 4 1 2 3 4 5
After performing the above numerical experiment for

FIG. 2. (a) Optimal QAOA parameters (~ ~ ∗ ) for an exam-
γ∗, β 100 random u3R and w3R graphs with various vertex
ple instance of MaxCut on a 16-vertex unweighted 3-regular number N , we discover some patterns in the optimal pa-
(u3R) graph at level p = 7. (b) The parameter pattern visu- rameters (~ ~ ∗ ). Generically, the optimal γ ∗ tends to
γ∗, β i
alized by plotting the optimal parameters of 40 instances of increase smoothly with i = 1, 2, · · ·, p, while βi∗ tends to
16-vertex u3R graphs, for 3 ≤ p ≤ 5. Each dashed line con- decrease smoothly, as shown for the example instance in
nects parameters for one particular graph instance. For each Fig. 2(a). In Fig. 2(b), we illustrate the pattern by si-
instance and each p, we use the classical BFGS optimization
multaneously plotting the optimal parameters for 40 in-
routine [33] from 104 random initial points, and keep the best
parameters. stances of 16-vertex u3R graphs for 3 ≤ p ≤ 5. Further-
more, for a given class of graphs, the optimal parameters
are observed to roughly occupy the same range of val-
III. OPTIMIZING VARIATIONAL ues as p is varied. Similar patterns are found for w3R
PARAMETERS graphs and weighted complete graphs, which we illus-
trate in Appendix A. This demonstrates a clear pattern
In this section, we address the issue of parameter op- in the optimal QAOA parameters that we can exploit in
timization in QAOA, since searching for the best pa- the optimization, as we discuss later in Sec. III B. Similar
rameters via standard approaches that rely on random patterns are found for parameters up to p . 15, if the
initialization generally become exponentially difficult as number of random seeds is increased accordingly.
level p increases. We mostly restrict our discussion to
randomly generated instances of u3R and w3R graphs. We give two remarks here: First, we note that sur-
Similar results are found for u4R and w4R graphs, as prisingly, even at small depth, this parameter pattern is
well as complete graphs with random weights. We utilize reminiscent of adiabatic quantum annealing where HC
patterns in the optimal parameters to develop heuristic is gradually turned on while HB is gradually turned off.
strategies that can efficiently find quasi-optimal solutions However, we will demonstrate in Sec. V that the mech-
in O(poly(p)) time. anism of QAOA goes beyond the adiabatic principle.
Secondly, we note that these optimal parameters have
a small spread over many different instances. This is be-
A. Patterns in optimal parameters ~ is a sum of terms
cause the objective function Fp (~γ , β)
corresponding to subgraphs involving vertices that are
Before searching for patterns in optimal QAOA pa- distance ≤ p away from every edge. At small p, there
rameters, it is useful to eliminate degeneracies in the are only a few relevant subgraph types that enter into
parameter space due to symmetries. Generally, QAOA Fp and effectively determine the optimal parameters. As
has a time-reversal symmetry, Fp (~ ~ = Fp (−~γ , −β),
γ , β) ~ N → ∞ and at a fixed finite p, we expect the probability
since both HB and HC are real-valued. For QAOA ap- of a relevant subgraph type appearing in a random graph
plied to MaxCut, there is an additional Z2 symmetry, to approach a fixed fraction. This implies that the dis-
as e−i(π/2)HB ≡ (σ x )⊗N commutes through the circuit. ~ ∗ ) should converge
tribution of optimal parameters (~γ ∗ , β
Furthermore, the structure of the MaxCut problem on to a fixed set of values in this limit.
5
B. Heuristic optimization strategy for large p 0.5
0.2
The optimal parameter patterns observed above indi-
cate that generically, there is a slowly varying continuous 0.1
curve that underlies the parameters γi∗ and βi∗ . More- 0.05
FOURIER
over, this curve changes only slightly from level p to p+1. 1000 RI runs avg.
0.02
These observations allow us to choose educated guesses best of 1000 RI runs
of variational parameters for (p+1)-level QAOA based on 0.01
optimized parameters from p-level (or in general from q- 0 2 4 6 8 10
level, where q ≤ p). These educated guesses can serve as 5
10
# randomly initialized (RI)

initial points fed to various classical optimization routines u3R graphs median
runs to match heuristic

that find a nearby local optimum. Based on this idea, we 4
w3R graphs median
10 fit u3R: 0.42 (2.63)p
have developed two types of heuristic strategies for ini-
tializing optimization. We give a high-level overview of 3 fit w3R: 0.55 (4.26)p
10
these heuristics in this section, while deferring the de-
10 2
tails of its implementation to Appendix B. While these
heuristics are not guaranteed to find the global optimum
of QAOA parameters, we show that in Sec. IV A, it can 1
10
produce, in O(poly(p)) time, quasi-optima that require
2O(p) randomly initialized optimization runs to surpass. 10 0
Consequently, this allows us to study the performance 1 2 3 4 5 6 7
and mechanism of QAOA beyond p = 1.
The first heuristic strategy, which we call INTERP, FIG. 3. (a) Comparison between our FOURIER heuris-
uses linear interpolation to choose initial parameters. tic and the random initialization (RI) approach for optimiz-
Starting at level p = 1, we optimize, and linearly interpo- ing QAOA, on an example instance of 16-vertex w4R graph.
late the curve formed by optimized parameters at level p The figure of merit 1 − r, where r is the approximation ra-
tio, is plotted as a function of QAOA level p on a log-linear
to extract a set of initial parameters for level p + 1.
scale. The RI points are obtained by optimizing from 1000
The second heuristic strategy, which we call randomly generated initial parameters, averaged over 10 such
FOURIER, uses a new parameterization of QAOA. In- realizations. (b) The median number of randomly initialized
stead of using the 2p parameters (~ ~ ∈ R2p , we switch
γ , β) optimization runs needed to obtain an approximation ratio as
2q
~ ) ∈ R , where the individual ele- good as our FOURIER heuristic, for 40 instances of 16-vertex
to 2q parameters (~ u, v
ments γi and βi are written as functions of (~ ~ ) through
u, v u3R and w3R graphs. A log-linear scale is used, and expo-
the following transformation: nential curves are fitted to the results. Error bars are 5th and
95th percentiles.
q

X 1 1 π
γi = uk sin k − i− ,
2 2 p
k=1
q (8)
X 1 1 π
βi = vk cos k − i− . for MaxCut problems. The INTERP heuristic is simpler
2 2 p
k=1 and may work better on other problems. More details on
the implementation of the heuristics and their compari-
These transformations are known as Discrete son can be found in Appendix B.
Sine/Cosine Transform, where uk and vk can be
interpreted as the amplitude of k-th frequency com- We stress that these heuristic strategies are developed
ponent for ~ γ and β, ~ respectively. In this strategy,
to generate good initial points for optimization. These
when optimizing level p + 1, the initial parameters are initial points can then be fed to standard optimization
generated by simply re-using the optimized amplitudes routines such as gradient descent, BFGS [33], Nelder-
u∗ , v
(~ ~ ∗ ) from level p. Note that when q ≥ p, the (~ ~)
u, v Mead [36], and Bayesian Optimization [37]. This is in
parametrization is capable of describing all possible contrast to the standard strategy of random initializa-
QAOA protocols at level p. However, the smoothness tion (RI), where one picks a random set of parameters to
of the optimal parameters (~ ~ implies that only the
γ , β) begin optimization. In order to find the global optimum
low-frequency components are important. Thus, we in a highly non-convex landscape, the number of RI runs
can also consider the case where q is a fixed constant needed generically scales exponentially with the number
independent of p, so the number of parameters is of parameters, which becomes intractable for large num-
bounded even as the QAOA circuit depth increases. ber of parameters. In the following section, we will com-
While both heuristics work well, we have decided to fo- pare our heuristics to the RI approach, and find that an
cus on using the FOURIER heuristics in the main text, exponential number of RI runs is needed to match the
because we find that it gives a slight edge in performance performance of our heuristics.
6
0
IV. PERFORMANCE OF HEURISTICALLY 10
OPTIMIZED QAOA 4
2
10 -1
A. Comparison between our heuristics and 0
randomly initialized (RI) optimizations N=8 10 15 20
-2 N=10
10
N=12
Here, we compare the performance of our heuristic N=14
-3 N=16
strategies to the standard strategy of random initializa- 10 N=18
tion (RI). In Fig. 3(a), we show the result for an ex- N=20
ample instance of 16-vertex w4R graph. At low p, our N=22
-4
10
FOURIER heuristic strategies perform just as well as the 0 5 10 15 20
best out of 1000 RI optimization runs—both are able to 0
10 2
find the global optimum. The average performance of
the RI strategy, on the other hand, is much worse than 1
our heuristics. This indicates that the QAOA parame- -1
ter landscape is highly non-convex and filled with low- 10 0
10 15 20
quality, non-degenerate local optima. When p ≥ 5, our
heuristic optimization outperforms the best RI run. To
estimate the number of RI runs needed to find an op- 10 -2
timum with the same or better approximation ratio as
our FOURIER heuristics, we generate 40 instances of 16-
vertex u3R and w3R graphs, and perform 40000 RI op-
10 -3
timization for each instance at each level p. In Fig. 3(b), 0 5 10 15 20
the median number of RI runs needed to match our
heuristic is shown to scale exponentially with p. There- FIG. 4. Average performance of QAOA as measured by the
fore, our heuristics offer a dramatic improvement in the fractional error 1 − r, plotted on log-linear scale. The results
resource required to find good parameters for QAOA. As are obtained by applying our heuristic optimization strategies
we have verified with an excessive number of RI runs, the FOURIER to up to 100 random instances of (a) u3R graphs
heuristics usually find the global optima. and (b) w3R graphs. Differently colored lines correspond to
We remark that although we mostly used a gradient- fitted lines to the average for different system size N , where
−p/p0
based optimization routine (BFGS) in our numerical sim- the model function√ is 1 − r ∝ e for unweighted graphs
− p/p0
ulations, non-gradient based routines, such as Nelder- and 1 − r ∝ e for weighted graphs. Insets show the
Mead [36], work equally well with our heuristic strategies. dependence of the fit parameters p0 on the system size N .
The choice to use BFGS is mainly motivated by the sim-
ulation speed. Later in Sec. VI, we account for the mea-
surement costs in estimating the gradient using a finite- MaxCut state whenever the fractional error goes be-
difference method, and perform a full Monte-Carlo simu- low ∼ 10−2 . This good performance can be understood
lation of the entire QAOA algorithm, including quantum by interpreting QAOA as Trotterized quantum anneal-
fluctuations in the determination of Fp . ing, especially when the optimized parameters are of
the pattern seen in Fig. 2, where one initializes in the
ground state of −HB and evolve with HB (and HC ) with
B. Performance of QAOA on typical instances smoothly decreasing (and increasing) durations. The
equivalent total annealing time T is approximately pro-
With our heuristic optimization strategies in hand, we portional to the level p, since the individual parameters
study the performance of intermediate p-level QAOA on γi , βi = O(1) and correspond to the evolution times un-
many graph instances. We consider many randomly gen- der HC and HB . If T is much longer than 1/∆2min ,
erated instances u3R and w3R graphs with vertex num- where ∆min is the minimum spectral gap, quantum an-
ber 8 ≤ N ≤ 22, and use our FOURIER strategy to nealing will be able to find the exact solution to MaxCut
find optimal QAOA parameters for level p ≤ 20. In the (ground state of −HC ) by the adiabatic theorem [17], and
following discussion, we use the fractional error 1 − r to achieve exponentially small fractional error as predicted
assess the performance of QAOA. by Landau-Zener [38]. Numerically, we find that the min-
We first examine the case of unweighted graphs, specif- imum gaps of these u3R instances when running quantum
ically u3R graphs. In Fig. 4(a), we plot the fractional annealing are on the order of ∆min & 0.2. It is thus not
error 1 − r as a function of QAOA’s level p. Here, we surprising that QAOA achieves exponentially small frac-
see that, on average, 1 − r ∝ e−p/p0 appears to decay tional error on average, since it is able to prepare the
exponentially with p. Note that since the instances stud- ground state of −HC through the adiabatic mechanism
ied are u3R graphs with system size N ≤ 22, where for these large-gap instances. Nevertheless, as we will see
Cmax ≤ |E| ≤ 33, we have essentially prepared the in the following section, this exponential behavior breaks
7
down for hard instances, where the gap is small. where t ∈ [0, T ] and T is the total annealing time.
Let us now turn to the case of w3R graphs. As shown The initial state is prepared to be the ground state of
in Fig. 4(b),
√ the fractional error appears to scale as HQA (s = 0), i.e., |ψ(0)i = |+i⊗N . The ground state of
1−r ∝ e − p/p0
. We note that the stretched-exponential the final Hamiltonian, HQA (s = 1), corresponds to the
scaling is true in the average sense, while individual in- solution of the MaxCut problem encoded in HC [43]. In
stances have very different behavior. For easy instances adiabatic QA, the algorithm relies on the adiabatic the-
whose corresponding minimum gaps ∆min are large, ex- orem to remain in the instantaneous ground state along
ponential scaling of the fractional error is found. For the annealing path, and solves the computational prob-
more difficult instances whose minimum gaps are small, lem by finding the ground state at the end. To guarantee
we find that their fractional errors reach plateaus at inter- success, the necessary run time of the algorithm typically
mediate p, before decreasing further when p is increased. scales as T = O(1/∆2min ), where ∆min is the minimum
We analyze these hard instances in more depth in the spectral gap [17]. Consequently, adiabatic QA becomes
following Section V. Surprisingly, when we average over inefficient for instances where ∆min is extremely small.
randomly generated instances, the fractional error is fit- We refer to these graph instances as hard instances (for
ted remarkably well by a stretched-exponential function. adiabatic QA).
These results of average performance of QAOA are Beyond the completely adiabatic regime, there is often
interesting despite considerations of finite-size effect. a tradeoff between the success probability (ground state
While the decay constant p0 does appear to depend on population pGS (T )) and the run time (annealing time T ):
the system size N as shown in the insets of Fig. 4, our one can either run the algorithm with a long annealing
finite-size simulations cannot conclusively determine the time to obtain a high success probability or run it mul-
exact scaling. Although it remains unknown whether the tiple times at a shorter time to find the solution at least
(stretched)-exponential scaling will start to break down once. A metric often used to determine the best balance
at larger system sizes, if the trend continues to system is the time-to-solution (TTS) [17]:
size of N = 100 ∼ 1000, then QAOA will be practi-
ln(1 − pd )
cally useful in solving interesting MaxCut instances, bet- TTSQA (T ) = T (10)
ter or on par with other known algorithms, in a regime ln[1 − pGS (T )]
where finding the exact solution will be infeasible. Even TTSopt
QA = min TTSQA (T ). (11)
T >0
if QAOA fails for the worst-case graphs, it can still be
practically useful, if it performs well on a randomly cho- TTSQA (T ) measures the time required to find the ground
sen graph of large system size. state at least once with the target probability pd (taken to
be 99% in this paper), neglecting non-algorithmic over-
heads such as state-preparation and measurement time.
V. ADIABATIC MECHANISM, QUANTUM
In the adiabatic regime where ln[1−pGS (T )] ∝ T ∆2min per
ANNEALING, AND QAOA
Landau-Zener formula [38], we have TTSQA ∝ 1/∆2min
which is independent of T . However, it has been ob-
In the previous section, we benchmark the performance served in some cases that it can be better to run QA
of QAOA for MaxCut on random graph instances in non-adiabatically to obtain a shorter TTS [14–17]. By
terms of the approximation ratio r. Although designed choosing the best annealing time T , regardless of adia-
for approximate optimization, QAOA is often able to find baticity, we can determine TTSopt QA as the minimum al-
the MaxCut configuration—the global optimum of the gorithmic run time of QA. For QAOA, a similar metric
problem—with a high probability as level p increases. In can be defined. The variational parameters γi and βi can
this section, we assess the efficiency of the algorithm to be regarded as the time evolved under the Hamiltonians
find the MaxCut configuration and compare it with quan- HC and HB , respectively. One can thus interpret the sum
tum annealing. In particular, we find that QAOA is not of the variational
necessarily limited by the minimum gap as in quantum Pparameters
p
to be the total “annealing”
time, i.e., Tp = i=1 (|γi∗ | + |βi∗ |) [44], and define
annealing, and explain a mechanism at work that allows
it to overcome the adiabatic limitations. ln(1 − pd )
TTSQAOA (p) = Tp (12)
ln[1 − pGS (p)]
A. Comparing QAOA with quantum annealing TTSopt
QAOA = min TTSQAOA (p), (13)
p>0
A predecessor of QAOA, quantum annealing (QA) has where pGS (p) is the ground state population after the
been widely studied for the purpose of solving combinato- optimal p-level QAOA protocol. Note this quantity does
rial optimization problems [39–42]. To find the MaxCut not take into account of the overhead in finding the opti-
configuration that maximizes hHC i, we consider the fol- mal parameters. We use TTSQAOA (p) here to benchmark
lowing simple QA protocol: the algorithm, and it should not be taken directly to be
the actual experimental run time.
HQA (s) = −[sHC + (1 − s)HB ], s = t/T, (9) To compare the algorithms, we compute TTSopt QA and
8
(a) 104
<latexit sha1_base64="v9gznLfdlpMQ0lcV8svmr+U6dWI=">AAADrHicfVLbbtNAEN3GXIq5tfDIy4jIUnkgilMEPJabhIRARWraCtuq1ptJssp619pdN41c/wGPvMJ/8TdsHFfkQhnJ1uzMnDNnZyfNBTe22/291fJu3Lx1e/uOf/fe/QcPd3YfHRtVaIZ9poTSpyk1KLjEvuVW4GmukWapwJN08m6ePzlHbbiSR3aWY5LRkeRDzqh1oW+xxQtb7tFn1dlOu9vp1gabTtg4bdLY4dlu63s8UKzIUFomqDFR2M1tUlJtORNY+XFhMKdsQkcYOVfSDE1S1pIrCFxkAEOl3Sct1NFlREkzY2ZZ6iozasdmPTcP/isXFXb4Oim5zAuLki0aDQsBVsH8/jDgGpkVM+dQprnTCmxMNWXWTckPYLmPVLYek3E0n2egphKa1KqeNKt8PwjgrabPJ2jhyxUOgsCPJU6ZyjIqB2WcauoKqqiXlLGgciQQ2iFcQrsHsa7PrtV77uRA2pBdidggqqJwlcVB5xKuQdR9HeJyXvu3GXz6Xw9Wo/Y31brf/jLLquYpt2NQOWpq3QtzCSnaKeIaPz3H9TssMb5xa+uGC47gw0Xu3myh8Loxu/0N17d10znudULnf33RPnjZbPI2eUKekj0SklfkgHwkh6RPGJHkB/lJfnkd78iLvGRR2tpqMI/JinnDPylxKrs=</latexit>
N=10
appears to be independent of the gap for all graphs that
N=12 have extremely small gaps, and beats the adiabatic TTS
N=14 (Landau-Zener line) by many orders of magnitude. This
3 N=16
10 suggests that an exponential improvement of TTS is pos-
N=18
QAOA
LZ sible with non-adiabatic mechanisms when adiabatic QA

TTSopt
2
is limited by exponentially small gaps.
10
Similarly for QA, the optimal annealing time T is
often not in the adiabatic limit for small-gap graphs.
In Fig. 5(b), we observe a strong correlation between
101
TTSopt opt
QAOA and TTSQA for each graph instance. This sug-
10-10 10-8 10-6 10-4 10-2 100 gests that QAOA is making use of the optimal annealing
4
schedule, regardless of whether a slow adiabatic evolution
(b)
<latexit sha1_base64="xJhQfSRWOt81v8if+ax/EPyOVFc=">AAADrHicfVLbbtNAEN3GXIq5tfDIy4jIUnkgilMEPJabhIRARWraCtuq1ptJssp619pdN41c/wGPvMJ/8TdsHFfkQhnJ1uzMnDNnZyfNBTe22/291fJu3Lx1e/uOf/fe/QcPd3YfHRtVaIZ9poTSpyk1KLjEvuVW4GmukWapwJN08m6ePzlHbbiSR3aWY5LRkeRDzqh1oW+xxQtb7qXPqrOddrfTrQ02nbBx2qSxw7Pd1vd4oFiRobRMUGOisJvbpKTaciaw8uPCYE7ZhI4wcq6kGZqkrCVXELjIAIZKu09aqKPLiJJmxsyy1FVm1I7Nem4e/FcuKuzwdVJymRcWJVs0GhYCrIL5/WHANTIrZs6hTHOnFdiYasqsm5IfwHIfqWw9JuNoPs9ATSU0qVU9aVb5fhDAW02fT9DClyscBIEfS5wylWVUDso41dQVVFEvKWNB5UggtEO4hHYPYl2fXav33MmBtCG7ErFBVEXhKouDziVcg6j7OsTlvPZvM/j0vx6sRu1vqnW//WWWVc1TbsegctTUuhfmElK0U8Q1fnqO63dYYnzj1tYNFxzBh4vcvdlC4XVjdvsbrm/rpnPc64TO//qiffCy2eRt8oQ8JXskJK/IAflIDkmfMCLJD/KT/PI63pEXecmitLXVYB6TFfOGfwAssiq8</latexit>
10
N=10 or a fast diabatic evolution is better. We believe that if
N=12 we use an optimized schedule beyond the linear ramp in
N=14 Eq. (9), QA should be able to match the performance of
N=16
N=18 QAOA. In the following subsection, we take a represen-
3
10 tative example to explain our results observed in Fig. 5
and a mechanism of QAOA to go beyond the adiabatic
QAOA
TTSopt
paradigm.
2
10
B. Beyond the adiabatic mechanism: a case study
To understand the behavior of QAOA, we focus on

1 graph instances that are hard for adiabatic QA in this
10
10
1
10
2
10
3
10
4 subsection. In particular, we study a representative in-
stance to explain how QAOA as well as diabatic QA
TTSopt
QA can overcome the adiabatic limitations. As illustrated in
Fig. 6(a), we demonstrate that QAOA can learn to uti-
FIG. 5. (a) TTSopt QAOA versus the minimum spectral gap in lize diabatic transitions at anti-crossings to circumvent
QA , ∆min , for many random w3R graph instances. 1000 ran- difficulties caused by very small gaps in QA.
dom instances are generated for each graph vertex size N . The Fig. 6(b) shows a representative graph with N = 14,
maximum cutoff p is taken to be pmax = 50, 50, 40, 35, 30 for
whose minimum spectral gap ∆min < 10−3 . For this ex-
N = 10, 12, 14, 16, 18, respectively. Dashed line corresponds
to adiabatic QA run time of 1/∆2min predicted by Landau-
ample hard instance, we first numerically simulate the
Zener [38]. (b) TTSopt opt quantum annealing process. Fig. 6(c) shows populations
QAOA versus TTSQA for each random
graph instance. Dashed line indicates when the two are equal. in the ground state and low excited states at the end
The (Pearson’s) correlation coefficient between of the process for different annealing time T . Since the
QAOA TTS
and QA TTS is ρ ln(TTSopt opt
QAOA ), ln(TTSQA ) ≈ 0.91.
minimum gap is very small, the adiabatic condition re-
quires T & 1/∆2min ≈ 106 . The Landau-Zener formula for
the ground state population pGS = 1 − exp −cT ∆2min
fits well with our exact numerical simulation, where c is
TTSopt
QAOA for many random graph instances. For each a fitting parameter. However, we can clearly observe a
even vertex number from N = 10 to N = 18, we “bump” in the ground state population at annealing time
randomly generate 1000 instances of w3R graphs. In T ≈ 40. At such a time scale, the dynamics is clearly
Fig. 5(a), we plot the relationship between TTSoptQAOA non-adiabatic; we call this phenomenon a diabatic bump.
and the minimum gap ∆min in quantum annealing for This phenomenon has been observed earlier in Ref. [14–
each instance. Most of the random graphs have large 17] for other optimization problems.
gaps (∆min & 10−2 ), and we observe that the opti- Subsequently, we simulate QAOA on this hard in-
mal TTS follows the Landau-Zener prediction of 1/∆2min stance. As mentioned earlier, although QAOA opti-
for these graphs. This indicates that a quasi-adiabatic mizes energy instead of ground state overlap, substan-
parametrization of QAOA is the best when ∆min is rea- tial ground state population can still be obtained even
sonably large. Many graphs, however, exhibit very small for many hard graphs. Using our FOURIER heuristic
gaps (∆min . 10−3 ), and thus require exceedingly long strategy, various low-energy state populations of the out-
run time for adiabatic evolution. For some graphs, ∆min put state are shown for different levels p in Fig. 6(d).
is as small as 10−8 , which implies an adiabatic evolu- We observe that QAOA can achieve similar ground state
tion requires a run time T & 1016 . Nevertheless, we population as the diabatic bump at small p, and then
see that QAOA can find the solution for these hard in- substantially enhance it after p & 24.
stances faster than adiabatic QA. Remarkably, TTSopt
QAOA To better understand the mechanism of QAOA and
9
(a)
(c) 1
<latexit sha1_base64="zgnrvYOUpsvIxW07PeqD5QuG8Gc=">AAADrHicfVLbbtNAEN3GXIq5tfDIy4jIUnkgilMEPJabhIRARWraCtuq1ptJssp619pdN41c/wGPvMJ/8TdsHFfkQhnJ1uzMnDNnZyfNBTe22/291fJu3Lx1e/uOf/fe/QcPd3YfHRtVaIZ9poTSpyk1KLjEvuVW4GmukWapwJN08m6ePzlHbbiSR3aWY5LRkeRDzqh1oW+xxQtb7rFn1dlOu9vp1gabTtg4bdLY4dlu63s8UKzIUFomqDFR2M1tUlJtORNY+XFhMKdsQkcYOVfSDE1S1pIrCFxkAEOl3Sct1NFlREkzY2ZZ6iozasdmPTcP/isXFXb4Oim5zAuLki0aDQsBVsH8/jDgGpkVM+dQprnTCmxMNWXWTckPYLmPVLYek3E0n2egphKa1KqeNKt8PwjgrabPJ2jhyxUOgsCPJU6ZyjIqB2WcauoKqqiXlLGgciQQ2iFcQrsHsa7PrtV77uRA2pBdidggqqJwlcVB5xKuQdR9HeJyXvu3GXz6Xw9Wo/Y31brf/jLLquYpt2NQOWpq3QtzCSnaKeIaPz3H9TssMb5xa+uGC47gw0Xu3myh8Loxu/0N17d10znudULnf33RPnjZbPI2eUKekj0SklfkgHwkh6RPGJHkB/lJfnkd78iLvGRR2tpqMI/JinnDPy/zKr0=</latexit>
(e) 1
<latexit sha1_base64="BGgG8xnVsAv5ixYeloQOr7teCb0=">AAADrHicfVLbbtNAEN3GXIq5tfDIy4jIUnkgilMEPJabhIRARWraCtuq1ptJssp619pdN41c/wGPvMJ/8TdsHFfkQhnJ1uzMnDNnZyfNBTe22/291fJu3Lx1e/uOf/fe/QcPd3YfHRtVaIZ9poTSpyk1KLjEvuVW4GmukWapwJN08m6ePzlHbbiSR3aWY5LRkeRDzqh1oW+xxQtb7uGz6myn3e10a4NNJ2ycNmns8Gy39T0eKFZkKC0T1Jgo7OY2Kam2nAms/LgwmFM2oSOMnCtphiYpa8kVBC4ygKHS7pMW6ugyoqSZMbMsdZUZtWOznpsH/5WLCjt8nZRc5oVFyRaNhoUAq2B+fxhwjcyKmXMo09xpBTammjLrpuQHsNxHKluPyTiazzNQUwlNalVPmlW+HwTwVtPnE7Tw5QoHQeDHEqdMZRmVgzJONXUFVdRLylhQORII7RAuod2DWNdn1+o9d3IgbciuRGwQVVG4yuKgcwnXIOq+DnE5r/3bDD79rwerUfubat1vf5llVfOU2zGoHDW17oW5hBTtFHGNn57j+h2WGN+4tXXDBUfw4SJ3b7ZQeN2Y3f6G69u66Rz3OqHzv75oH7xsNnmbPCFPyR4JyStyQD6SQ9InjEjyg/wkv7yOd+RFXrIobW01mMdkxbzhHzZ1Kr8=</latexit>
Energy excited
state 0.8 0.8
population
0.6
ground 0.6 0.6
parameters
state
QAOA
0.4
QAOA 0.4 Diabatic Adiabatic QA 0.4
0.2
anti-crossing bump
Time 0.2 0.2 0
10 20 30 40
0 0
0 1 2 3 4 5 6 7
10 10 10 10 10 10 10 10 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
(b) 0.15 (d) 1 (f) 1
instantaneous eigenstate
0.76 0.38 <latexit sha1_base64="Gn+8THen3bVQiod5fl8UonChIlM=">AAADrHicfVLbbtNAEN3GXIq5tfDIy4jIUnkgilMEPJabhIRARWraCtuq1ptJssp619pdN41c/wGPvMJ/8TdsHFfkQhnJ1uzMnDNnZyfNBTe22/291fJu3Lx1e/uOf/fe/QcPd3YfHRtVaIZ9poTSpyk1KLjEvuVW4GmukWapwJN08m6ePzlHbbiSR3aWY5LRkeRDzqh1oW+xxQtb7g2eVWc77W6nWxtsOmHjtEljh2e7re/xQLEiQ2mZoMZEYTe3SUm15Uxg5ceFwZyyCR1h5FxJMzRJWUuuIHCRAQyVdp+0UEeXESXNjJllqavMqB2b9dw8+K9cVNjh66TkMi8sSrZoNCwEWAXz+8OAa2RWzJxDmeZOK7Ax1ZRZNyU/gOU+Utl6TMbRfJ6BmkpoUqt60qzy/SCAt5o+n6CFL1c4CAI/ljhlKsuoHJRxqqkrqKJeUsaCypFAaIdwCe0exLo+u1bvuZMDaUN2JWKDqIrCVRYHnUu4BlH3dYjLee3fZvDpfz1YjdrfVOt++8ssq5qn3I5B5aipdS/MJaRop4hr/PQc1++wxPjGra0bLjiCDxe5e7OFwuvG7PY3XN/WTee41wmd//VF++Bls8nb5Al5SvZISF6RA/KRHJI+YUSSH+Qn+eV1vCMv8pJFaWurwTwmK+YN/wAzNCq+</latexit>
<latexit sha1_base64="FpGO/COUmuuD0j6OLlMTRRSXg50=">AAADrHicfVLbbtNAEN3GXIq5tfDIywjLUnkgilMEPJabhIRARWraCtuq1ptxsoq9a+2um0au/4BHXuG/+Bs2jityoYxka3Zmzpmzs5MUGdem1/u91XFu3Lx1e/uOe/fe/QcPd3YfHWtZKoYDJjOpThOqMeMCB4abDE8LhTRPMjxJJu/m+ZNzVJpLcWRmBcY5HQmeckaNDX2LDF6Yai99Vp/teL1urzHYdILW8Uhrh2e7ne/RULIyR2FYRrUOg15h4ooqw1mGtRuVGgvKJnSEoXUFzVHHVSO5Bt9GhpBKZT9hoIkuIyqaaz3LE1uZUzPW67l58F+5sDTp67jioigNCrZolJYZGAnz+8OQK2Qmm1mHMsWtVmBjqigzdkquD8t9hDTNmLSl+TwDORXQplb1JHntur4PbxV9PkEDX65w4PtuJHDKZJ5TMayiRFFbUIf9uIoyKkYZghfAJXh9iFRztq3ecysHkpbsSsQGUR0GqywWOpdwDaLpaxGX89q/zeDT/3qwBrW/qdb+9pdZVjVPuRmDLFBRY1+YC0jQTBHX+Ok5rt9hifGNXVs7XLAEHy4K+2YLhdeN2e5vsL6tm85xvxtY/+sL7+Blu8nb5Al5SvZIQF6RA/KRHJIBYUSQH+Qn+eV0nSMndOJFaWerxTwmK+akfwA5tirA</latexit>
GS
0.16
0.52
0.8 1E
0.8 2E
0.9
population
3E
population
0.66 0.19
0.6 0.6 4E
0.56
0.43 0.7
0.76 0.4 0.4
0.82
0.37 0.89 0.2

0.47 0.2
0.76 0.16
0.14 0.69
0.36 0 0
0 10 20 30 40 50 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
FIG. 6. (a) A schematic of how QAOA and the interpolated annealing path can overcome the small minimum gap limitations
via diabatic transitions (purple line). Naive adiabatic quantum annealing path leads to excited states passing through the
anti-crossing point (green line). (b) An instance of weighted 3-regular graph that has a small minimum spectral gap along the
quantum annealing path given by Eq. (9). The optimal MaxCut configuration is shown with two vertex colors, and the black
(red) lines are the cut (uncut) edges. (c) Population in low excited states after the quantum annealing protocol with different
total time T . The black solid line follows the Landau-Zener formula for the ground state population, pGS = 1 − exp −cT ∆2min ,

where c is a fitting parameter. (d) Population in low excited states using QAOA at different level p. The FOURIER heuristic
strategy is used in the optimization. (e) Interpreting QAOA parameters (at p = 40) as a smooth quantum annealing path, via
linear interpolation according to Eq. (14). The annealing time parameter s = t/Tp , where Tp = pi=1 (|γi∗ | + |βi∗ |). The red
P
dotted line labels the location of anti-crossing where the gap is at its minimum, at which point f (s) ≈ 0.92. Inset shows the
original QAOA optimal parameters γi∗ and βi∗ for p = 40. (f) Instantaneous eigenstate population under the annealing path
given in (e). Note that the instantaneous ground state and first excited state swap at the anti-crossing point.
make a meaningful comparison with QA, we can interpret accumulates in the first excited state before the anti-
the QAOA parameters as a smooth annealing path. We crossing point, where the gap is at its minimum. Using
again interpret the sum of the variational
Pp parameters to a diabatic transition at the anti-crossing, the two states
be the total annealing time, i.e., Tp = i=1 (|γi |+|βi |). A swap populations, and a large ground state population is
smooth annealing path can be constructed from QAOA obtained in the end. We note that the final state popula-
optimal parameters as tion from the constructed annealing path differs slightly
from those of QAOA, due to Trotterization and interpo-
HQAOA (t) = −[f (t)HC + (1 − f (t))HB ], (14) lation, but the underlying mechanism is the same, which
is also responsible for the diabatic bump seen in Fig. 6(c).
 
i
X 1 γ∗ In addition to the conversion used in Eq. (14), we have
f ti = (|γj∗ | + |βj∗ |) − (|γi∗ | + |βi∗ |) = ∗ i ∗ ,
j=1
2 |γi | + |βi | tested a few other prescriptions to construct an annealing
path from QAOA parameters, and qualitative features do
where ti is chosen to be at the mid-point of each time not seem to change.
interval (γi∗ + βi∗ ). With the boundary conditions f (t = Hence, our results indicate that QAOA is closely re-
0) = 0, f (t = Tp ) = 1 and linear interpolation at other lated to a cleverly optimized diabatic QA path that
intermediate time t, we can convert QAOA parameters to can overcome limitations set by the adiabatic theorem.
a well-defined annealing path. We apply this conversion Through optimization, QAOA can find a good anneal-
to the QAOA optimal parameters at p = 40 [Fig. 6(e)]. ing path and exploit diabatic transitions to enhance
With this annealing path, we can follow the instanta- ground state population. This explains the observation
neous eigenstate population throughout the quantum an- in Fig. 5(a) that TTSopt
QAOA can be significantly shorter
nealing process [Fig. 6(f)]. In contrast to adiabatic QA, than the time required by the adiabatic algorithm. On
the state population leaks out of the ground state and the other hand, as seen in Fig. 5(b), TTSopt
QAOA is strongly
10
correlated with TTSoptQA : QAOA automatically finds a 0

10
good annealing path, which could be adiabatic or not,
depending on what is the best route for the specific prob-
lem instance.
The effective dynamics of QAOA for this specific in- 10
-1
stance, as we see in Fig. 6(f), can be understood mostly

by an effective two-level system (see Appendix D for first
excited
more discussions). In general, the energy spectrum can
10
-2 state
be more complex, and the dynamics may involve many
excited states, which may not be explainable by the sim-
ple schematic in Fig. 6(a). Nonetheless, QAOA can find
a suitable path via our heuristic optimization strategies 10
-3
0 1 2 3 4 5
even in more complicated cases [45]. 10 10 10 10 10 10
VI. CONSIDERATIONS FOR EXPERIMENTAL

FIG. 7. Full Monte-Carlo simulation of QAOA account-
IMPLEMENTATION
ing for measurement projection noise, on the example 14-
vertex instance studied in Fig. 6(b). For various optimiza-
In this section, we discuss some important consider- tion strategies, we keep track of approximation ratio ri =
ations for experimental realization. The framework of |Cuti |/|MaxCut| from the i-th measurement, and plot min-
QAOA is general and can be applied to various exper- imum fractional error 1 − ri found after M measurements,
imental platforms to solve combinatorial optimization averaged over many Monte-Carlo realizations. The blue solid
problems. Here, we again focus on the MaxCut prob- and red dashed lines correspond to QAOA optimized with
lem as a paradigmatic example, although it can also be the FOURIER strategy starting with an educated guess of
applied to solve other interesting problems [45, 46]. (~ ~ at p = 1 and p = 5, respectively. The orange dash-dot
γ , β)
line corresponds to QAOA optimized starting with random
guesses at p = 1. The three labelled dotted lines are results
A. Finite measurement samples from QA with total time T = 10, 102 , 103 .
So far, we have focused on understanding the best the-

oretically possible performance of QAOA, and assumed
perfect measurement precision of the objective function random choices of initial parameters starting at p = 1
in our numerical simulations. However, due to quan- perform much worse, which fails to find the MaxCut
tum fluctuations (i.e., projection noise) in actual experi- solution until 103 –104 measurements have been made.
ments, the precision is finite since it is obtained via aver- Moreover, when we compare QAOA to QA with various
aging over finitely many measurement outcomes that can annealing time, it appears that the choice of annealing
only take on discrete values. Hence, in practice, there is time T = 100 can perform just as well as QAOA on this
a trade-off between measurement cost and optimization instance. Nevertheless, running QAOA at level p = 5
quality: finding good optimum requires good precision is still more advantageous than QA at T = 100, when
at the cost of large number of measurements [47]. Ad- coherence time is limited.
ditionally, large variance in the objective function value
demands more measurements, but may help improve the We remark that our simulation is only limited to small-
chances of finding near-optimal MaxCut configurations. size instances, and the good performance of QAOA and
Here, we study the effect of measurement projection QA we observe is complicated by the small but signif-
noise with a full Monte-Carlo simulation of QAOA on icant ground state population from generic annealing
some example graphs, where objective function is eval- schedules. Since it often takes 102 measurements to ob-
uated by repeated projective measurements until its er- tain a sufficiently precise estimate of the objective func-
ror is below a threshold. More implementation details of tion, a ground state probability of & 10−2 would mean
this numerical simulation are discussed in Appendix F. In that one can find the ground state without much pa-
Fig. 7, we present the Monte-Carlo simulation result for rameter optimization. Nevertheless, as quantum com-
the example instance studied earlier in Fig. 6(b). Here, puters with 102 ∼ 103 qubits begin to come online, it
we simulate QAOA by starting at either level p = 1 or will be interesting to see how QAOA performs on much
p = 5, and increasing to higher p using our FOURIER larger instances where the ground state probability from
heuristics. The initial parameters at p = 1 and p = 5, re- generic QAOA/QA protocol is expected to be exponen-
spectively, are chosen based on known optima found for tially small, whereas the number of measurements neces-
smaller size instances. We see that QAOA can find the sary for optimization grows only polynomially with prob-
MaxCut solution in ∼10–102 measurements, and starting lem size. The results here indicate that the parameter
our optimization at intermediate level (p = 5) is better pattern and our heuristic strategies are practically useful
than starting at the lowest level (p = 1). In comparison, guidelines in realistic implementation of QAOA.
11
B. Implementation for large problem sizes Original graph G Vertex renumbering
1 2 3 4 5 1 2 3 5 4
As experiments begin to test solving the MaxCut prob- 0.4
Adjacency matrix |ri
0
lems with quantum machines [6, 8], limited quantum co- 0.3
<latexit sha1_base64="EV6lbV0Z+aFzGLKCEyaMuQoYjl8=">AAADqXicfVLbbtNAEN3GXIq5tfDIy4jIEhIislMk+tgWkJAQqJVIExFH1XozSVZZ71q764bI9R/wwit8GX/DxnFFLpSRbM3OzDlzdnaSTHBjw/D3TsO7dfvO3d17/v0HDx893tt/cm5Urhl2mBJK9xJqUHCJHcutwF6mkaaJwG4yfbvIdy9RG67kFzvPcJDSseQjzqh1oW48RVvo8mKvGbbCymDbiWqnSWo7vdhvfI+HiuUpSssENaYfhZkdFFRbzgSWfpwbzCib0jH2nStpimZQVHpLCFxkCCOl3SctVNFVREFTY+Zp4ipTaidmM7cI/ivXz+3ocFBwmeUWJVs2GuUCrILF5WHINTIr5s6hTHOnFdiEasqsG5EfwGofqWw1I+NoPs1BzSTUqXU9SVr6fhDAiaav3DTh8zUOgsCPJc6YSlMqh0WcaOoKyn57UMSCyrFAaEZwBc02xLo6u1bvuJMDSU12LWKLqOxH6ywOupBwA6Lq6xBXi9q/zeDj/3qwCnWwrdb9DlZZ1jXPuJ2AylBT616YS0jQzhA3+Oklbt5hhfHY7awbLjiC998y92ZLhTeN2e1vtLmt2855uxWFrejsdfPosN7kXfKMPCcvSETekCPygZySDmFkSn6Qn+SX99I783re12VpY6fGPCVr5rE/Oesp3g==</latexit>
⇡
<latexit sha1_base64="nu9rZhS/7267RpKPYVvsc7A9ybY=">AAADqHicfVLbbtNAEN3GXIq5tIVHXkZElhASkZ0i0ccCRUKqQEEiSVEcVevNJFmy3rV21w2Rmz/ggVf4s/4NG8cVuVBGsjU7M+fM2dlJMsGNDcOrnZp36/adu7v3/PsPHj7a2z943DEq1wzbTAmlzxJqUHCJbcutwLNMI00Tgd1k8m6R716gNlzJL3aWYT+lI8mHnFHrQp34BIWl5/v1sBGWBttOVDl1Ulnr/KD2Ix4olqcoLRPUmF4UZrZfUG05Ezj349xgRtmEjrDnXElTNP2ilDuHwEUGMFTafdJCGV1FFDQ1ZpYmrjKldmw2c4vgv3K93A6P+gWXWW5RsmWjYS7AKljcHQZcI7Ni5hzKNHdagY2ppsy6CfkBrPaRypYjMo7m4wzUVEKVWteTpHPfDwJ4q+nLCVr4dI2DIPBjiVOm0pTKQREnmrqCea/ZL2JB5Ugg1CO4hHoTYl2eXasT7uRAUpFdi9gimveidRYHXUi4AVH2dYjLRe3fZnD6vx6sRB1uq3W/w1WWdc1TbsegMtTUuhfmEhK0U8QNfnqBm3dYYXzjVtYNFxzB+++Ze7OlwpvG7PY32tzWbafTbERhI/r8qn58VG3yLnlKnpHnJCKvyTH5QFqkTRj5Rn6SX+S398JreV3v67K0tlNhnpA185I/k5kpEA==</latexit>
1 <latexit sha1_base64="KLmMC1a0zbyE7x8d8C7kgRNe0oQ=">AAADpXicfVLbbtNAEN3GXIq5tIVHXkZEFrwQ2SkSfSw3CQkVFdGkleIoWm8mySrrXWt33RC5+QN4hW/jb1g7rsiFMpKt2Zk5Z87OTpIJbmwY/t5peLdu37m7e8+//+Dho739g8ddo3LNsMOUUPoioQYFl9ix3Aq8yDTSNBF4nkzflfnzS9SGK3lm5xn2UzqWfMQZtS70Nc74YL8ZtsLKYNuJaqdJajsdHDS+x0PF8hSlZYIa04vCzPYLqi1nAhd+nBvMKJvSMfacK2mKpl9UWhcQuMgQRkq7T1qooquIgqbGzNPEVabUTsxmrgz+K9fL7eioX3CZ5RYlWzYa5QKsgvLiMOQamRVz51CmudMKbEI1ZdaNxw9gtY9UtpqPcTQnc1AzCXVqXU+SLnw/COCtpi+naOHzNQ6CwI8lzphKUyqHRZxo6goWvXa/iAWVY4HQjOAKmm2IdXV2rd5zJweSmuxaxBbRohetszhoKeEGRNXXIa7K2r/N4NP/erAKdbit1v0OV1nWNc+4nYDKUFPrXphLSNDOEDf46SVu3mGF8Y3bVzdccAQfvmXuzZYKbxqz299oc1u3nW67FYWt6Mur5vFRvcm75Cl5Rl6QiLwmx+QjOSUdwsiY/CA/yS/vuXfinXndZWljp8Y8IWvmDf4ApRAn4Q==</latexit>
<latexit
probability
100
herence time and graph connectivity will be among the
0.2
⇡ 3
⌦
biggest challenges. In terms of coherence time, QAOA is 200
<latexit sha1_base64="E8odiej6R+uKF6NaIRN4nteXfrw=">AAADqHicfVLbjtMwEPU2XJZw2QuPvIyoIiEkqqSLxD4uLEhICCgSbRc11cpxp62pY0e2s6XK9g944BX+jL/BTbOiF5aREo1n5pw5Hk+SCW5sGP7eqXk3bt66vXvHv3vv/oO9/YPDjlG5ZthmSih9llCDgktsW24FnmUaaZoI7CaT00W+e4HacCU/21mG/ZSOJB9yRq0LdeKPKY7o+X49bISlwbYTVU6dVNY6P6h9jweK5SlKywQ1pheFme0XVFvOBM79ODeYUTahI+w5V9IUTb8o5c4hcJEBDJV2n7RQRlcRBU2NmaWJq0ypHZvN3CL4r1wvt8PjfsFllluUbNlomAuwChZ3hwHXyKyYOYcyzZ1WYGOqKbNuQn4Aq32ksuWIjKN5PwM1lVCl1vUk6dz3gwBeafpsghY+XOEgCPxY4pSpNKVyUMSJpq5g3mv2i1hQORII9Qguod6EWJdn1+o1d3IgqciuRGwRzXvROouDLiRcgyj7OsTlovZvM3j3vx6sRB1tq3W/o1WWdc1TbsegMtTUuhfmEhK0U8QNfnqBm3dYYXzpVtYNFxzBm2+Ze7OlwuvG7PY32tzWbafTbERhI/r0vH5yXG3yLnlEHpMnJCIvyAl5S1qkTRj5Sn6Qn+SX99RreV3vy7K0tlNhHpI185I/kIMpDw==</latexit>
<latexit
<latexit sha1_base64="KLmMC1a0zbyE7x8d8C7kgRNe0oQ=">AAADpXicfVLbbtNAEN3GXIq5tIVHXkZEFrwQ2SkSfSw3CQkVFdGkleIoWm8mySrrXWt33RC5+QN4hW/jb1g7rsiFMpKt2Zk5Z87OTpIJbmwY/t5peLdu37m7e8+//+Dho739g8ddo3LNsMOUUPoioQYFl9ix3Aq8yDTSNBF4nkzflfnzS9SGK3lm5xn2UzqWfMQZtS70Nc74YL8ZtsLKYNuJaqdJajsdHDS+x0PF8hSlZYIa04vCzPYLqi1nAhd+nBvMKJvSMfacK2mKpl9UWhcQuMgQRkq7T1qooquIgqbGzNPEVabUTsxmrgz+K9fL7eioX3CZ5RYlWzYa5QKsgvLiMOQamRVz51CmudMKbEI1ZdaNxw9gtY9UtpqPcTQnc1AzCXVqXU+SLnw/COCtpi+naOHzNQ6CwI8lzphKUyqHRZxo6goWvXa/iAWVY4HQjOAKmm2IdXV2rd5zJweSmuxaxBbRohetszhoKeEGRNXXIa7K2r/N4NP/erAKdbit1v0OV1nWNc+4nYDKUFPrXphLSNDOEDf46SVu3mGF8Y3bVzdccAQfvmXuzZYKbxqz299oc1u3nW67FYWt6Mur5vFRvcm75Cl5Rl6QiLwmx+QjOSUdwsiY/CA/yS/vuXfinXndZWljp8Y8IWvmDf4ApRAn4Q==</latexit>
<latexit
highly advantageous: the hybrid nature of QAOA as well 0.1

300 |1i <latexit sha1_base64="Ftny+aNI64p5ZTP3O9AODiA2YtA=">AAADqXicfVLbbtNAEN3GXIq5tfDIy4jIEhIislMk+tgWkJAQqJVIExFH1XozSVZZ71q764bI9R/wwit8GX/DxnFFLpSRbM3OzDlzdnaSTHBjw/D3TsO7dfvO3d17/v0HDx893tt/cm5Urhl2mBJK9xJqUHCJHcutwF6mkaaJwG4yfbvIdy9RG67kFzvPcJDSseQjzqh1oW48RVtE5cVeM2yFlcG2E9VOk9R2erHf+B4PFctTlJYJakw/CjM7KKi2nAks/Tg3mFE2pWPsO1fSFM2gqPSWELjIEEZKu09aqKKriIKmxszTxFWm1E7MZm4R/Feun9vR4aDgMsstSrZsNMoFWAWLy8OQa2RWzJ1DmeZOK7AJ1ZRZNyI/gNU+UtlqRsbRfJqDmkmoU+t6krT0/SCAE01fuWnC52scBIEfS5wxlaZUDos40dQVlP32oIgFlWOB0IzgCpptiHV1dq3ecScHkprsWsQWUdmP1lkcdCHhBkTV1yGuFrV/m8HH//VgFepgW637HayyrGuecTsBlaGm1r0wl5CgnSFu8NNL3LzDCuOx21k3XHAE779l7s2WCm8as9vfaHNbt53zdisKW9HZ6+bRYb3Ju+QZeU5ekIi8IUfkAzklHcLIlPwgP8kv76V35vW8r8vSxk6NeUrWzGN/AGacKZ0=</latexit>
as its short- and intermediate-depth circuit parametriza-

0
400
0 100 200 300 400 |0i
<latexit sha1_base64="hirVVigZwQvAH+5S//1FRUlUyrQ=">AAADqXicfVLbbtNAEN3GXIq5tfDIy4jIEhIislMk+tgWkJAQqJVIExFH1XozSVZZ71q764bI9R/wwit8GX/DxnFFLpSRbM3OzDlzdnaSTHBjw/D3TsO7dfvO3d17/v0HDx893tt/cm5Urhl2mBJK9xJqUHCJHcutwF6mkaaJwG4yfbvIdy9RG67kFzvPcJDSseQjzqh1oW48RVuE5cVeM2yFlcG2E9VOk9R2erHf+B4PFctTlJYJakw/CjM7KKi2nAks/Tg3mFE2pWPsO1fSFM2gqPSWELjIEEZKu09aqKKriIKmxszTxFWm1E7MZm4R/Feun9vR4aDgMsstSrZsNMoFWAWLy8OQa2RWzJ1DmeZOK7AJ1ZRZNyI/gNU+UtlqRsbRfJqDmkmoU+t6krT0/SCAE01fuWnC52scBIEfS5wxlaZUDos40dQVlP32oIgFlWOB0IzgCpptiHV1dq3ecScHkprsWsQWUdmP1lkcdCHhBkTV1yGuFrV/m8HH//VgFepgW637HayyrGuecTsBlaGm1r0wl5CgnSFu8NNL3LzDCuOx21k3XHAE779l7s2WCm8as9vfaHNbt53zdisKW9HZ6+bRYb3Ju+QZeU5ekIi8IUfkAzklHcLIlPwgP8kv76V35vW8r8vSxk6NeUrWzGN/AGNcKZw=</latexit>
tion makes it ideal for near-term quantum devices. In 50 100 150 200 250 300 350 400
bandwidth
addition, we have demonstrated that QAOA is not gener-
ally limited by the small spectral gaps, which raises hopes FIG. 8. (a) Vertex renumbering to reduce the graph band-
to (approximately) solve interesting problems within the width. Top panel: an example of vertex renumbering for a
coherence time. 5-vertex graph. Bottom panel: distribution of graph band-
What would be the necessary problem size to explore widths for 1000 random 3-regular graphs with N = 400
a meaningful quantum advantage? We note that one of each. The blue (orange) bars are the bandwidth before (af-
the leading exact classical solver, the BiqMac solver [48], ter) vertex renumbering. Inset shows the sparsity pattern
is able to solve MaxCut exactly up to N . 100, but of the adjacency matrix before and after vertex renumbering
takes a long time (more than an hour) for larger prob- for one particular graph. (b) A protocol to use Rydberg-
lem sizes. A fast heuristic algorithm, the breakout local blockade controlled gate to implement the interaction term
exp(−iγσiz σjz ). By choosing a proper gate time for the second
search algorithm [49], can find the MaxCut solution with √
a high probability for problem size of a few hundreds, step t = 2π/ Ω2 + ∆2 , one does not accumulate popula-
tion on the Rydberg level |ri. With tunable coupling strength
although the solution is not guaranteed. Hence, a Max-
Ω and detuning ∆, one can control the interaction time γ.
Cut problem with a few hundred vertices will be an in-
teresting regime to benchmark quantum algorithms. In
terms of approximation, we have noted earlier that the
polynomial-time Goemans-Willamson algorithm has an NP-hard, but good heuristic algorithms have been devel-
approximation ratio guarantee of r∗ ≈ 0.878. It will be oped to reduce the graph bandwidth. Fig. 8(a) shows a
also interesting to find out if QAOA is able to achieve simple example of bandwidth reduction. The top panel
a better approximation ratio for some problem instances illustrates the vertex renumbering with a 5-vertex graph.
where the exact solution is not obtainable, in this regime The bottom panel shows the histogram of graph band-
of hundreds of vertices. widths for 1000 random 3-regular graphs of N = 400
each. Using the Cuthill-McKee algorithm [50], the graph
We now discuss a few considerations that put these
bandwidth can be reduced to around B ≈ 100. While
large-size problems in the experimentally feasible regime
this still requires quite a long interaction range in 1D,
on near-term quantum systems.
the bandwidth problem can also be generalized to higher
dimensions. In 2D arrangements, we √ then expect the di-
ameter of interaction to be B2D ∼ 100 for N = 400
1. Reducing interaction range
3-regular graphs [51], which seems within reach for near-
term quantum devices. A detailed study of the 2D band-
In a quantum experiment, each vertex can be rep- width problem is beyond the scope of this work. Alter-
resented by a qubit. For a large problem size, a ma- natively, one can make use of a general construction to
jor challenge to encode general graphs will be the nec- encode any long-range interactions to local fields [52], al-
essary range and versatility of the interaction patterns though it requires additional physical qubits and gauge
(between qubits). The embedding of a random graph constraints. Apart from implementing arbitrary graphs
into a physical implementation with a 1D or 2D ge- in full generality, one can also restrict to special graphs
ometry may require very long-range interactions. By that exhibit some geometric structures. For example,
relabelling the graph vertices, one can actually reduce unit disk graphs are geometric graphs in the 2D plane,
the required range of interactions. This can be formu- where vertices are connected by an edge only if they are
lated as the graph bandwidth problem: Given a graph within a unit distance. These graphs can be more natu-
G = (V, E) with N vertices, a vertex numbering is a bi- rally encoded into 2D physical implementations, and the
jective map from vertices to distinct integers, f : V → MaxCut problem is still NP-hard on unit disk graphs [53].
{1, 2, · · ·, N }. The bandwidth of a vertex numbering f
is Bf (G) = max{|f (u) − f (v)| : (u, v) ∈ E(G)}, which
can be understood as the length of the longest edge (in
1D). The graph bandwidth problem is then to find the 2. Example Implementation with Rydberg Atoms
minimum bandwidth among all vertex numberings, i.e.,
B(G) = minf Bf (G); namely, it is to minimize the length The above discussion has been experimental-platform
of the longest edge by vertex renumbering. independent, and is applicable to any state-of-the-art
In general, finding the minimum graph bandwidth is platforms, such as neutral Rydberg atoms [18, 19, 54–56],
12
trapped ions [7, 57–59], and superconducting qubits [5, patterns in the QAOA optimal parameters, we developed
6, 60, 61]. As an example, we briefly describe a feasible heuristic strategies for initializing optimization so as to
implementation of QAOA with neutral atoms interact- efficiently find quasi-optimal parameters in O(poly(p))
ing via Rydberg excitations, where high-fidelity entangle- time. In contrast, optimization with the standard ran-
ment has been recently demonstrated [62]. We can use dom initialization strategy that explores the entire pa-
the hyperfine ground states in each atom to encode the rameter space needs 2O(p) runs to obtain an equally
qubit states |0i and |1i, and the |1i state can be excited good solution. We benchmarked, using these heuristic
to the Rydberg level |ri to induce interactions. The qubit optimization strategies, the performance of QAOA up
PN x
rotating term, exp −iβ j=1 σj , can be implemented to p ≤ 50. On average, the performance characterized
by the approximation ratio was observed to improve ex-
straightforwardly by a global driving P beam with tunable ponentially (or stretched-exponentially) for random un-
durations. The interaction terms hi,ji σiz σjz can be im-
weighted (or weighted) graphs. Focusing on hard graph
plemented stroboscopically for general graphs; this can instances where adiabatic QA fails due to extremely small
be realized by a Rydberg-blockade controlled gate [19], spectral gaps, we found that QAOA could learn via op-
as illustrated in Fig. 8(b). By controlling the coupling timization a diabatic path to achieve significantly higher
strength Ω, detuning ∆, and the gate time, together with success probabilities to find the MaxCut solution. A met-
single-qubit rotations, one can implement exp(−iγσiz σjz ), ric taking into account of both the success probability and
which can then be repeated for each interacting pair. the run time indicates QAOA may not be limited by small
An additional major advantage of the Rydberg-blockade spectral gaps. Finally, we provided a detailed resource
mechanism is its ability to perform multi-qubit collective analysis for experimental implementation of QAOA on
gates in parallel [19, 63]. This can reduce the number MaxCut and proposed a neutral-atom realization where
of two-qubit operation steps from the number of edges problem sizes of a few hundred atoms are feasible in the
to the number of vertices, N , which means a factor of near future.
N reduction for dense graphs with ∼N 2 edges. While
the falloff of Rydberg interactions may limit the distance As we have benchmarked via random MaxCut in-
two qubits can interact, MaxCut problems of interesting stances, our simple heuristic optimization strategies work
sizes can still be implemented by vertex renumbering or very well. Nevertheless, more sophisticated methods
focusing on unit disk graphs, as discussed above. could be developed to improve the performance and ro-
bustness. One possibility would be using machine learn-
For generic problems of 400-vertex 3-regular graphs,
ing techniques to automatically learn and make use of the
we expect the necessary interaction range to be 5 atoms
optimal parameter patterns to develop even more efficient
in 2D; assuming minimum inter-atom separation of 2 µm,
parametrization and strategies (see e.g., [67]). Another
this means an interaction radius of 10 µm, which is re-
important but unsettled problem is assessing a reliable
alizable with high Rydberg levels [19]. Given realistic
system-size scaling for QAOA. On average, we observed
estimates of coupling strength Ω ∼ 2π × 10-100 MHz
(stretched) exponential improvement with level p, up to
and single-qubit coherence time τ ∼ 200 µs (limited by
N = 22. It remains open whether the same scaling will
Rydberg level lifetime), we expect with high-fidelity con-
persist for larger system size. For the hard instances
trol, the error per two-qubit gate can be made roughly
we generated that have exceedingly small spectral gaps,
(Ωτ )−1 ∼ 10−3 -10−4 . For 400-vertex 3-regular graphs, we
QAOA is able to overcome the adiabatic limitations in all
can implement QAOA of level p ' Ωτ /N ∼ 25 with a 2D
cases examined; it remains to be seen how this behavior
array of neutral atoms. We note that these are conser-
could extrapolate to larger problems. An experimental
vative estimates since we do not consider advanced con-
implementation with near-term quantum devices will be
trol techniques such as pulse-shaping, and require less
able to push the limit of our understanding and serve
than one error in the entire quantum circuit; it is pos-
as a litmus test for genuine quantum speedup in solving
sible that QAOA is not sensitive to some of the imper-
practical problems.
fections. Hence, we envision that soon, QAOA can be
benchmarked against classical [24, 48, 49] and semiclas- Besides MaxCut, another interesting optimization
sical [64–66] solvers on large problem sizes with near-term problem is the Maximum Independent Set (MIS) prob-
quantum devices. lem, which is also NP-hard and has many real-world ap-
plications [68, 69]. In a separate work [45], we show
that the MIS problem can be naturally encoded into the
ground state of neutral atoms interacting via Rydberg ex-
VII. CONCLUSION AND OUTLOOK citations, with minimal overhead on the hardware. The
methodology we have developed here for QAOA on Max-
In summary, we have studied the performance, non- Cut can be directly adapted to the MIS problem, where
adiabatic mechanism, and practical implementation of we observe similar parameter patterns and non-adiabatic
QAOA applied to MaxCut problems. Our results provide mechanisms of QAOA; this is discussed in more detail in
important insights into the performance of QAOA, and Appendix H. With such rapid development in near-term
suggest promising strategies for its practical realization quantum computer, we will soon be able to witness ex-
on near term quantum devices. Based on the observed perimental tests of the capability of quantum algorithms
13
to solve practically interesting problems. Research in Science. H.P. is supported by the NSF
through a grant for the Institute for Theoretical Atomic,
Molecular, and Optical Physics at Harvard University
and the Smithsonian Astrophysical Observatory. Some
ACKNOWLEDGMENTS of the computations in this paper were performed on
the Odyssey cluster supported by the FAS Division
We thank Edward Farhi, Aram Harrow, Peter Zoller, of Science, Research Computing Group at Harvard
Ignacio Cirac, Jeffrey Goldstone, Sam Gutmann, Wen University.
Wei Ho, and Dries Sels for fruitful discussions. S.-
T.W. in addition would like to thank Daniel Lidar, Note added.—After the completion of this work, we be-
Matthias Troyer, and Johannes Otterbach for insightful came aware of a related work appearing in Ref. [70]. In
discussions and suggestions during the Aspen Winter the preprint, they trained the QAOA variational parame-
Conference on Advances in Quantum Algorithms and ters on a batch of graph instances and compared QAOA’s
Computation. This work was supported through the performance with the classical Goemans-Williamson al-
National Science Foundation (NSF), the Center for gorithm on small-size MaxCut problems. Similar param-
Ultracold Atoms, the Air Force Office of Scientific eter shapes were found, but Ref. [70] did not make use of
Research via the MURI, the Vannevar Bush Faculty any observed patterns to design optimization strategies.
Fellowship, DOE, and Google Research Award. S.C. We, in addition, discovered non-adiabatic mechanisms of
acknowledges support from the Miller Institute for Basic QAOA, which is not studied in Ref. [70].
[1] John Preskill, “Quantum Computing in the NISQ era and Wilkes, Thomas Loke, Sean O’Gara, Laurent Kling, Gra-
beyond,” Quantum 2, 79 (2018). ham D. Marshall, Raffaele Santagati, Timothy C. Ralph,
[2] Edward Farhi, Jeffrey Goldstone, and Sam Gutmann, “A Jingbo B. Wang, Jeremy L. O’Brien, Mark G. Thompson,
quantum approximate optimization algorithm,” (2014), and Jonathan C. F. Matthews, “Large-scale silicon quan-
arXiv:1411.4028. tum photonics implementing arbitrary two-qubit process-
[3] Alberto Peruzzo, Jarrod McClean, Peter Shadbolt, Man- ing,” Nature Photonics 12, 534–539 (2018).
Hong Yung, Xiao-Qi Zhou, Peter J. Love, Alán Aspuru- [9] Edward Farhi, Jeffrey Goldstone, and Sam Gutmann, “A
Guzik, and Jeremy L. O’Brien, “A variational eigenvalue Quantum Approximate Optimization Algorithm Applied
solver on a photonic quantum processor,” Nature Com- to a Bounded Occurrence Constraint Problem,” (2014),
munications 5, 4213 (2014). arXiv:1412.6062.
[4] Nikolaj Moll, Panagiotis Barkoutsos, Lev S Bishop, [10] Edward Farhi and Aram W Harrow, “Quantum
Jerry M Chow, Andrew Cross, Daniel J Egger, Stefan Fil- supremacy through the quantum approximate optimiza-
ipp, Andreas Fuhrer, Jay M Gambetta, Marc Ganzhorn, tion algorithm,” (2016), arXiv:1602.07674.
Abhinav Kandala, Antonio Mezzacapo, Peter Müller, [11] M. B. Hastings, “Classical and Quantum Bounded Depth
Walter Riess, Gian Salis, John Smolin, Ivano Tavernelli, Approximation Algorithms,” , arXiv:1905.07047 (2019),
and Kristan Temme, “Quantum optimization using varia- arXiv:1905.07047.
tional algorithms on near-term quantum devices,” Quan- [12] Sergey Bravyi, Alexander Kliesch, Robert Koenig, and
tum Science and Technology 3, 030503 (2018). Eugene Tang, “Obstacles to state preparation and varia-
[5] Abhinav Kandala, Antonio Mezzacapo, Kristan Temme, tional optimization from symmetry protection,” (2019),
Maika Takita, Markus Brink, Jerry M. Chow, and arXiv:1910.08980.
Jay M. Gambetta, “Hardware-efficient variational quan- [13] Jarrod R. McClean, Sergio Boixo, Vadim N. Smelyanskiy,
tum eigensolver for small molecules and quantum mag- Ryan Babbush, and Hartmut Neven, “Barren plateaus
nets,” Nature 549, 242 (2017). in quantum neural network training landscapes,” Nature
[6] J. S. Otterbach, R. Manenti, N. Alidoust, A. Best- Communications 9, 4812 (2018).
wick, M. Block, B. Bloom, S. Caldwell, N. Didier, [14] Elizabeth Crosson, Edward Farhi, Cedric Yen-Yu Lin,
E. Schuyler Fried, S. Hong, P. Karalekas, C. B. Os- Han-Hsuan Lin, and Peter Shor, “Different Strategies for
born, A. Papageorge, E. C. Peterson, G. Prawiroat- Optimization Using the Quantum Adiabatic Algorithm,”
modjo, N. Rubin, C. A. Ryan, D. Scarabelli, M. Scheer, (2014), arXiv:1401.7320.
E. A. Sete, P. Sivarajah, R. S. Smith, A. Staley, N. Tezak, [15] Siddharth Muthukrishnan, Tameem Albash, and
W. J. Zeng, A. Hudson, B. R. Johnson, M. Reagor, Daniel A. Lidar, “Tunneling and speedup in quantum op-
M. P. da Silva, and C. Rigetti, “Unsupervised Machine timization for permutation-symmetric problems,” Phys.
Learning on a Hybrid Quantum Computer,” (2017), Rev. X 6, 031010 (2016).
arXiv:1712.05771. [16] Layla Hormozi, Ethan W. Brown, Giuseppe Carleo, and
[7] C. Kokail, C. Maier, R. van Bijnen, T. Brydges, M. K. Matthias Troyer, “Nonstoquastic hamiltonians and quan-
Joshi, P. Jurcevic, C. A. Muschik, P. Silvi, R. Blatt, C. F. tum annealing of an ising spin glass,” Phys. Rev. B 95,
Roos, and P. Zoller, “Self-Verifying Variational Quan- 184416 (2017).
tum Simulation of the Lattice Schwinger Model,” (2018), [17] Tameem Albash and Daniel A. Lidar, “Adiabatic quan-
arXiv:1810.03421. tum computation,” Rev. Mod. Phys. 90, 015002 (2018).
[8] Xiaogang Qiang, Xiaoqi Zhou, Jianwei Wang, Callum M. [18] Hannes Bernien, Sylvain Schwartz, Alexander Keesling,
14
Harry Levine, Ahmed Omran, Hannes Pichler, Soon- [−2π, 2π) for w3R graphs. Although γi0 can meaningfully
won Choi, Alexander S. Zibrov, Manuel Endres, Markus take values beyond the restricted range γi0 ∈ [−2π, 2π)
Greiner, Vladan Vuletić, and Mikhail D. Lukin, “Probing for w3R graph, we find that broadening the range does
many-body dynamics on a 51-atom quantum simulator,” not improve performance. The ranges of the output pa-
Nature 551, 579 (2017). rameters are not restricted in our unconstrained opti-
[19] M. Saffman, T. G. Walker, and K. Mølmer, “Quantum mization routine.
information with rydberg atoms,” Rev. Mod. Phys. 82, [35] We denote the best of all local optima optimized from
2313–2363 (2010). k random initial points (seeds) to be (~ ~ B[k] ).
γ B[k] , β
[20] Christos H Papadimitriou and Kenneth Steiglitz, Com- When the same optimum (~ γ B[k] ~ B[k]
,β ) is found from
binatorial optimization: algorithms and complexity many different seeds and continues to yield the best
(Courier Corporation, 1998). ~ B[k] ) as we increase the number of seeds k, we
γ B[k] , β
Fp (~
[21] Bernhard Korte, Jens Vygen, B Korte, and J Vygen, ~ ∗) =
then claim that it is a global optimum, i.e., (~ γ∗, β
Combinatorial optimization, Vol. 2 (Springer, 2012). B[k] ~ B[k]
[22] Johan Håstad, “Some optimal inapproximability results,” limk→∞ (~ γ ,β ).
J. ACM 48, 798–859 (2001). [36] J A Nelder and R Mead, “A Simplex Method for Func-
[23] Piotr Berman and Marek Karpinski, “On Some tion Minimization,” The Computer Journal 7, 308–313
Tighter Inapproximability Results (Extended Abstract),” (1965).
(Springer, Berlin, Heidelberg, 1999) pp. 200–209. [37] Peter I. Frazier, “A Tutorial on Bayesian Optimization,”
[24] Michel X. Goemans and David P. Williamson, “Improved (2018), arXiv:1807.02811.
approximation algorithms for maximum cut and satis- [38] Landau and L. D., “Zur Theorie der Energieubertragung
fiability problems using semidefinite programming,” J. II,” Z. Sowjetunion 2, 46–51 (1932); C. Zener, “Non-
ACM 42, 1115–1145 (1995). Adiabatic Crossing of Energy Levels,” Proc. R. Soc. Lon-
[25] Eran Halperin, Dror Livnat, and Uri Zwick, “MAX don, Ser. A 137, 696–702 (1932).
CUT in cubic graphs,” Journal of Algorithms 53, 169– [39] Tadashi Kadowaki and Hidetoshi Nishimori, “Quantum
185 (2004). annealing in the transverse ising model,” Phys. Rev. E
[26] Zhihui Wang, Stuart Hadfield, Zhang Jiang, and 58, 5355–5363 (1998).
Eleanor G. Rieffel, “Quantum approximate optimization [40] Edward Farhi, Jeffrey Goldstone, Sam Gutmann, Joshua
algorithm for maxcut: A fermionic view,” Phys. Rev. A Lapan, Andrew Lundgren, and Daniel Preda, “A quan-
97, 022304 (2018). tum adiabatic evolution algorithm applied to random in-
[27] Dave Wecker, Matthew B. Hastings, and Matthias stances of an np-complete problem,” Science 292, 472–
Troyer, “Training a quantum optimizer,” Phys. Rev. A 475 (2001).
94, 022309 (2016). [41] Sergio Boixo, Troels F. Rønnow, Sergei V. Isakov, Zhihui
[28] Zhi-Cheng Yang, Armin Rahmani, Alireza Shabani, Wang, David Wecker, Daniel A. Lidar, John M. Martinis,
Hartmut Neven, and Claudio Chamon, “Optimizing vari- and Matthias Troyer, “Evidence for quantum annealing
ational quantum algorithms using pontryagin’s minimum with more than one hundred qubits,” Nature Physics 10,
principle,” Phys. Rev. X 7, 021027 (2017). 218 (2014).
[29] W. W. Ho and T. H. Hsieh, “Efficient unitary [42] Troels F. Rønnow, Zhihui Wang, Joshua Job, Sergio
preparation of non-trivial quantum states,” (2018), Boixo, Sergei V. Isakov, David Wecker, John M. Mar-
arXiv:1803.00026. tinis, Daniel A. Lidar, and Matthias Troyer, “Defining
[30] Zhang Jiang, Eleanor G. Rieffel, and Zhihui Wang, and detecting quantum speedup,” Science 345, 420–424
“Near-optimal quantum circuit for grover’s unstructured (2014).
search using a transverse field,” Phys. Rev. A 95, 062317 [43] To be consistent with the language of QA, here we use
(2017). the terminology of ground state and low excited states
[31] E. R. Anschuetz, J. P. Olson, A. Aspuru-Guzik, and of −HC instead of referring to them as highest excited
Y. Cao, “Variational Quantum Factoring,” (2018), states in the MaxCut language.
arXiv:1808.08927. [44] This could change with different normalizations of the
[32] The approximation ratio r = (2p + 1)/(2p + 2) is found Hamiltonians HC and HB , but our qualitative results re-
for an infinite ring. For a finite ring with N vertices, main the same. The physical limitation in the experiment
numerical calculations show that for p < bN/2c one has is typically the interaction strength in HC .
[45] H. Pichler, S.-T. Wang, L. Zhou, S. Choi, and M. D.
r = (2p + 1)/(2p + 2) for even N and r = (2p+1)(N +1)
(2p+2)N
for
Lukin, “Quantum Optimization for Maximum Inde-
odd N , and for p ≥ bN/2c one has r = 1. pendent Set Using Rydberg Atom Arrays,” (2018),
[33] C. G. Broyden, “The convergence of a class of double- arXiv:1808.10816.
rank minimization algorithms 1. general considerations,” [46] H. Pichler, S.-T. Wang, L. Zhou, S. Choi, and M. D.
IMA Journal of Applied Mathematics 6, 76–90 (1970); Lukin, “Computational complexity of the Rydberg block-
R. Fletcher, “A new approach to variable metric al- ade in two dimensions,” (2018), arXiv:1809.04954.
gorithms,” The Computer Journal 13, 317–322 (1970); [47] G. Giacomo Guerreschi and M. Smelyanskiy, “Practical
Donald Goldfarb, “A family of variable-metric methods optimization for hybrid quantum-classical algorithms,”
derived by variational means,” Mathematics of Compu- (2017), arXiv:1701.01450.
tation 24, 23–23 (1970); D. F. Shanno, “Condition- [48] Franz Rendl, Giovanni Rinaldi, and Angelika Wiegele,
ing of quasi-Newton methods for function minimization,” “Solving max-cut to optimality by intersecting semidef-
Mathematics of Computation 24, 647–647 (1970). inite and polyhedral relaxations,” Mathematical Pro-
[34] The initial points βi0 are drawn uniformly from [− π4 , π4 ), gramming 121, 307 (2008).
and γi0 are drawn uniformly [− π2 , π2 ) for u3R graphs or [49] Una Benlic and Jin-Kao Hao, “Breakout local search for
15
the max-cutproblem,” Engineering Applications of Arti- Mikhail D. Lukin, “High-fidelity control and entangle-
ficial Intelligence 26, 1162 – 1173 (2013). ment of rydberg-atom qubits,” Phys. Rev. Lett. 121,
[50] E. Cuthill and J. McKee, “Reducing the bandwidth of 123603 (2018).
sparse symmetric matrices,” in Proceedings of the 1969 [63] L. Isenhower, M. Saffman, and K. Mølmer, “Multibit C
24th National Conference, ACM ’69 (ACM, New York, k NOT quantum gates via Rydberg blockade,” Quantum
NY, USA, 1969) pp. 157–172. Information Processing 10, 755–770 (2011).
[51] Lan Lin and Yixun Lin, “Square-root rule of two- [64] Takahiro Inagaki, Yoshitaka Haribara, Koji Igarashi,
dimensional bandwidth problem,” RAIRO - Theoretical Tomohiro Sonobe, Shuhei Tamate, Toshimori Honjo,
Informatics and Applications 45, 399–411 (2011). Alireza Marandi, Peter L. McMahon, Takeshi Umeki,
[52] Wolfgang Lechner, Philipp Hauke, and Peter Zoller, “A Koji Enbutsu, Osamu Tadanaga, Hirokazu Takenouchi,
quantum annealing architecture with all-to-all connectiv- Kazuyuki Aihara, Ken-ichi Kawarabayashi, Kyo Inoue,
ity from local interactions,” Science Advances 1 (2015). Shoko Utsunomiya, and Hiroki Takesue, “A coherent
[53] Josep Díaz and Marcin Kamiński, “Max-cut and max- ising machine for 2000-node optimization problems,” Sci-
bisection are np-hard on unit disk graphs,” Theoretical ence (2016).
Computer Science 377, 271 – 276 (2007). [65] Peter L. McMahon, Alireza Marandi, Yoshitaka Harib-
[54] Henning Labuhn, Daniel Barredo, Sylvain Ravets, Syl- ara, Ryan Hamerly, Carsten Langrock, Shuhei Ta-
vain de Léséleuc, Tommaso Macrì, Thierry Lahaye, and mate, Takahiro Inagaki, Hiroki Takesue, Shoko Ut-
Antoine Browaeys, “Tunable two-dimensional arrays of sunomiya, Kazuyuki Aihara, Robert L. Byer, M. M. Fe-
single rydberg atoms for realizing quantum ising mod- jer, Hideo Mabuchi, and Yoshihisa Yamamoto, “A fully-
els,” Nature 534, 667 (2016). programmable 100-spin coherent ising machine with all-
[55] A. Keesling, A. Omran, H. Levine, H. Bernien, H. Pich- to-all connections,” Science (2016).
ler, S. Choi, R. Samajdar, S. Schwartz, P. Silvi, [66] Ryan Hamerly, Takahiro Inagaki, Peter L. McMa-
S. Sachdev, P. Zoller, M. Endres, M. Greiner, V. Vuletic, hon, Davide Venturelli, Alireza Marandi, Tatsuhiro On-
and M. D. Lukin, “Probing quantum critical dynam- odera, Edwin Ng, Carsten Langrock, Kensuke Inaba,
ics on a programmable Rydberg simulator,” (2018), Toshimori Honjo, Koji Enbutsu, Takeshi Umeki, Ry-
arXiv:1809.05540. oichi Kasahara, Shoko Utsunomiya, Satoshi Kako, Ken-
[56] Aishwarya Kumar, Tsung-Yao Wu, Felipe Giraldo, and ichi Kawarabayashi, Robert L. Byer, Martin M. Fejer,
David S. Weiss, “Sorting ultracold atoms in a three- Hideo Mabuchi, Dirk Englund, Eleanor Rieffel, Hiroki
dimensional optical lattice in a realization of maxwell’s Takesue, and Yoshihisa Yamamoto, “Experimental in-
demon,” Nature 561, 83–87 (2018). vestigation of performance differences between Coher-
[57] H. Häffner, C.F. Roos, and R. Blatt, “Quantum com- ent Ising Machines and a quantum annealer,” (2018),
puting with trapped ions,” Physics Reports 469, 155 – arXiv:1805.05217.
203 (2008). [67] Marin Bukov, Alexandre G. R. Day, Dries Sels, Phillip
[58] S. Debnath, N. M. Linke, C. Figgatt, K. A. Landsman, Weinberg, Anatoli Polkovnikov, and Pankaj Mehta, “Re-
K. Wright, and C. Monroe, “Demonstration of a small inforcement learning in different phases of quantum con-
programmable quantum computer with atomic qubits,” trol,” Phys. Rev. X 8, 031086 (2018).
Nature 536, 63 (2016). [68] Alireza Vahdatpour, Foad Dabiri, Maryam Moazeni, and
[59] J. Zhang, G. Pagano, P. W. Hess, A. Kyprianidis, Majid Sarrafzadeh, “Theoretical Bound and Practical
P. Becker, H. Kaplan, A. V. Gorshkov, Z. X. Gong, Analysis of Connected Dominating Set in Ad Hoc and
and C. Monroe, “Observation of a many-body dynami- Sensor Networks,” in Distributed Computing (Springer,
cal phase transition with a 53-qubit quantum simulator,” Berlin, Heidelberg, Berlin, Heidelberg, 2008) pp. 481–
Nature 551, 601 (2017). 495.
[60] R. Barends, A. Shabani, L. Lamata, J. Kelly, A. Mezza- [69] Pankaj K Agarwal, Marc van Kreveld, and Subhash Suri,
capo, U. Las Heras, R. Babbush, A. G. Fowler, B. Camp- “Label placement by maximum independent set in rect-
bell, Yu Chen, Z. Chen, B. Chiaro, A. Dunsworth, E. Jef- angles,” Computational Geometry 11, 209–218 (1998).
frey, E. Lucero, A. Megrant, J. Y. Mutus, M. Neeley, [70] G. E. Crooks, “Performance of the Quantum Approx-
C. Neill, P. J. J. O’Malley, C. Quintana, P. Roushan, imate Optimization Algorithm on the Maximum Cut
D. Sank, A. Vainsencher, J. Wenner, T. C. White, Problem,” (2018), arXiv:1811.08419.
E. Solano, H. Neven, and John M. Martinis, “Digitized [71] C. Moler and C. Van Loan, “Nineteen dubious ways to
adiabatic quantum computing with a superconducting compute the exponential of a matrix, twenty-five years
circuit,” Nature 534, 222 (2016). later,” SIAM Review 45, 3–49 (2003).
[61] C. Neill, P. Roushan, K. Kechedzhi, S. Boixo, S. V. [72] Roger B. Sidje, “Expokit: A software package for com-
Isakov, V. Smelyanskiy, A. Megrant, B. Chiaro, puting matrix exponentials,” ACM Trans. Math. Softw.
A. Dunsworth, K. Arya, R. Barends, B. Burkett, 24, 130–156 (1998).
Y. Chen, Z. Chen, A. Fowler, B. Foxen, M. Giustina, [73] Awad H Al-Mohy and Nicholas J Higham, “Computing
R. Graff, E. Jeffrey, T. Huang, J. Kelly, P. Klimov, the Action of the Matrix Exponential, with an Applica-
E. Lucero, J. Mutus, M. Neeley, C. Quintana, D. Sank, tion to Exponential Integrators,” SIAM Journal on Sci-
A. Vainsencher, J. Wenner, T. C. White, H. Neven, and entific Computing 33, 488–511 (2011).
J. M. Martinis, “A blueprint for demonstrating quantum [74] Navin Khaneja, Timo Reiss, Cindie Kehlet, Thomas
supremacy with superconducting qubits,” Science 360, Schulte-Herbrüggen, and Steffen J. Glaser, “Optimal
195–199 (2018). control of coupled spin dynamics: design of nmr pulse se-
[62] Harry Levine, Alexander Keesling, Ahmed Omran, quences by gradient ascent algorithms,” Journal of Mag-
Hannes Bernien, Sylvain Schwartz, Alexander S. Zibrov, netic Resonance 172, 296 – 305 (2005).
Manuel Endres, Markus Greiner, Vladan Vuletić, and
16
Appendix A: Optimal parameter pattern for stood via the effective mean-field strength that each qubit
weighed graphs experiences.
For complete graphs in Fig. 9(b), we observe that ~ γ∗
Here, we illustrate the pattern of optimal QAOA pa- for different weighted graphs have a narrower spread as
rameters for both weighted 3-regular (w3R) graphs and well as smaller value compared to both u3R and w3R
weighted complete graphs. The weight of each edge is graphs. This is because there is only one type of sub-
randomly drawn from uniform distribution on the inter- graph that every edge sees. The non-zero spread is at-
val [0, 1]. In Fig. 9, we illustrate the pattern by simulta- tributed to the fact that there are random weights on the
neously plotting the optimal parameters for 40 instances. edges. We also expect this spread to vanish as problem
In both cases, we see a pattern analogous to what was size N increases, when for a typical instance, the distri-
found for unweighted 3-regular (u3R) graphs in Fig. 2(b), bution of weights on the N − 1 edges incident to each
where γi∗ tend to increase smoothly and βi∗ tend to de- qubits converges. The smaller value of γ ~ ∗ can also be
crease smoothly with i = 1, 2, . . . , p. The optimal param- understood via the effective mean-field strength picture,
eter of graphs from the same class also appears to cluster as each qubit interacts with all N − 1 other qubits.
together in the same range.
We observe that the spread of ~ γ ∗ for w3R graphs in
Fig. 9(a) is wider than that for u3R graphs in Fig. 2(b). Appendix B: Details of heuristic optimization
This is because the random weights effectively increase strategies
the number of subgraph types. Moreover, the larger value
γ ∗ for w3R compared to u3R graphs can be under-
for ~ In the main text, we have proposed two classes of
heuristic strategies called INTERP and FOURIER for
generating initial points in optimizing QAOA parame-
1 1 1
p=3 p=4 p=5 ters. Both INTERP and FOURIER strategies work well
for all the instances we have examined. We have chosen
0.5 0.5 0.5 to use FOURIER for the results in the main text due to
the slight edge in its performance in finding better optima
when random perturbations are introduced. We now ex-
0 0 0 plain how these strategies work in details, and compare
1 2 3 1 2 3 4 1 2 3 4 5
their performances.
0.2 0.2 0.2
0.15 0.15 0.15
0.1 0.1 0.1 1. Interpolation-based strategy
0.05 0.05 0.05
p=3 p=4 p=5
0 0 0 In the optimization strategy that we call INTERP, we
1 2 3 1 2 3 4 1 2 3 4 5
use linear interpolation to produce a good initial point
for optimizing QAOA as one iteratively increases the
p=3 p=4 p=5 level p. This is based on the observation that the shape
0.4 0.4 0.4
of parameters (~ ∗ ~∗
,β
0.3 0.3 0.3
γ(p+1) (p+1) ) closely resembles that of
0.2 0.2 0.2 (~ ∗ ~
γ , β ).∗
(p) (p)
0.1 0.1 0.1 The strategy works as follows: For a given instance,
0 0 0 we iteratively optimize QAOA starting from p = 1, and
1 2 3 1 2 3 4 1 2 3 4 5 L ~L
increment p after obtaining a local optimum (~γ(p) , β(p) ).
0.2 0.2 0.2 In order to optimize parameters for level p + 1, we take
0.15 0.15 0.15 the optimized parameters from level p and produce initial
0.1 0.1 0.1 points (~ 0 ~0
,β
γ(p+1) (p+1) ) according to:
0.05 0.05 0.05
p=3 p=4 p=5 h i h i h i
0 0 0 0
~γ(p+1) = i−1
γ L
~ (p) + p−i+1 L
~γ(p) (B1)
1 2 3 1 2 3 4 1 2 3 4 5 i
p
i−1
p
i
for i = 1, 2, . . . , p + 1. Here, we denote [~ γ ]i ≡ γi

FIG. 9. Optimal QAOA parameters (~ ~ ∗ ) for 40 instances
γ∗, β as the i-th element of the parameter vector ~ γ , and
of 16-vertex (a) weighted 3-regular (w3R) graphs and (b) [~ L
γ(p) ]0 ≡ [~ L
γ(p) ]p+1 ≡ 0. The expression for β ~0
(p+1) is
weighted complete graphs, for 3 ≤ p ≤ 5. Each dashed line
the same as above after replacing γ → β. Starting
connects parameters for one particular graph instance. For 0
from (~γ(p+1) ,β~0
each instance and each p, the BFGS routine is used to opti- (p+1) ), we then optimize (e.g., using the
mize from 104 random initial points, and the best parameters BFGS routine) to obtain a local optimum (~γ L , β ~L ) (p+1) (p+1)
~ ∗ ).
γ∗, β
are kept as (~ for the (p + 1)-level QAOA. Finally, we increment p by
17
one and repeat the same process until a target level is vation that the basic FOURIER[∞, 0] strategy can some-
reached. times get stuck at a sub-optimal local optimum, whereas
perturbing its initial point can improve the performance
of QAOA. Here, in addition to optimizing according to
2. Details of FOURIER[q, R] strategy the basic strategy, we optimize (p + 1)-level QAOA from
R + 1 extra initial points, R of which are generated by
We now provide the details of our second heuristic adding random perturbations to the best of all local op-
strategy for optimizing QAOA that we call FOURIER. tima (~uB(p) , v
B
~ (p) ) found at level p. Specifically, for each
As described in the Sec. III B main text, here we use a instance at (p + 1)-level QAOA, and for r = 0, 1, . . . , R,
new representation of the p-level QAOA parameters as we optimize starting from
(~ ~ ) ∈ R2q where
u, v (
0,r uB
(~ (p) , 0), r=0
q
~ (p+1) =
u B P,r
X 1 1 π (~
u(p) + α~ u(p) , 0), 1 ≤ r ≤ R
γi = k−
uk sin i− , (B4)
2 2 p (
B
k=1
(B2) 0,r (~
v(p) , 0), r=0
q ~ (p+1) =
v B P,r
, 0), 1 ≤ r ≤ R

X 1 1 π (~
v(p) + α~ v(p)
βi = vk cos k − i− .
2 2 p
k=1
~ P,r
where u ~ P,r
(p) and v(p) contain random numbers drawn
Roughly speaking, the FOURIER strategy works by from normal distributions with mean 0 and variance
starting with level p = 1, optimize, and then reuse the
optimum at level p in (~ ~ )-representation to generate a
u, v
good initial point for level p + 1.
In fact, we propose several variants of the FOURIER
strategy for optimizing p-level QAOA: they are called
FOURIER[q, R], characterized by two integer parame-
ters q and R. The first integer q labels the maximum
frequency component we allow in the amplitude param-
eters (~ ~ ). When we set q = p, the full power of p-
u, v
level QAOA can be utilized, in which case we simply Level Level
denote the strategy as FOURIER[∞, R], since q grows FOURIER Initial Points Local Optima
unbounded with p. Nevertheless, the smoothness of the
optimal parameters (~ ~ implies that only the low-
γ , β)
frequency components are important. Thus, we can also FOURIER
consider the case where q is a fixed constant independent
of p, so the number of parameters is bounded even as
the QAOA circuit depth increases. The second integer best of
R is the number of random perturbations we add to the
parameters so that we can sometimes escape a local opti-
mum towards a better one. For the results shown in the
main text, we use the FOURIER[∞, 10] strategy, mean-
ing we set q = p and R = 10, unless otherwise stated.
In the basic FOURIER[∞, 0] variant of this strategy,
FIG. 10. Schematics of the two variants of FOURIER[q, R]
we generate a good initial point for level p + 1 by adding
heuristic strategy for optimizing QAOA, when R = 0 and
a higher frequency component, initialized at zero ampli- R > 0. Optimized parameters (green) at level p − 1 are used
tude, to the optimum at level p. More concretely, we to generate good initial points (blue) for optimizing at level
take parameters from a local optimum (~ uL L
~ (p)
(p) , v ) ∈ R2p p; the same process is repeated to optimize for p + 1. When
at level p, and generate a initial point (~u0(p+1) , v 0
~ (p+1) )∈ generating initial points, black dashed arrows indicate ap-
2(p+1) pending a higher frequency component [Eq. (B3)], and pink
R according to
dashed arrows correspond to adding random perturbations
[Eq. (B4)]. In the R > 0 variant, two local optima (green)
~ 0(p+1) = (~
u uL(p) , 0), v 0
~ (p+1) = (~ L
v(p) , 0). (B3) are kept in parallel at each level p for generating good ini-
uL
tial points: (~ (p) , v
L
~ (p) ) is the same optimum obtained from
Using (~ u0(p+1) , v 0
~ (p+1) ) as a initial point, we perform FOURIER[q, 0] strategy, and (~ uB B
~ (p)
(p) , v ) is the best of R + 2
BFGS optimization routine to obtain a local optimum local optima, R of which are obtained from perturbed ini-
uL
(~ (p+1) , v
L
~ (p+1) ) for the level p + 1. tial points. We find that keeping the (~ uL(p) , v
L
~ (p) ) optimum
We also consider an improved variant of the strategy, improves the stability of the heuristic, as random perturba-
FOURIER[∞, R > 0], which is sketched alongside the tions can sometimes lead to erratic and non-smooth optimal
R = 0 variant in Fig. 10. This is motivated by our obser- parameters.
18
~B
given by u (p) and v
B
~ (p) : -1
10
h i h i2
~ P,r
u (p) ∼ Normal 0, u B
~ (p) ,
k k
h i h i2 (B5)
P,r B
~ (p)
v ∼ Normal 0, v~ (p) .
k k 10 -2
INTERP
There is a free parameter α corresponding to the strength FOURIER[ ,0]
of the perturbation. Based on our experience from trial FOURIER[ ,10]
and error, setting α = 0.6 has consistently yielded good FOURIER[5,10]
results. This choice of α is assumed for the results in this 10 -3
paper. 0 10 20 30 40 50
We remark that while the INTERP strategy can also

FIG. 11. The fractional error achieved by optimizing using
get stuck in a local optimum, we find that adding per-
our various heuristics on an example instance of 14-vertex
turbations to INTERP does not work as well as to w3R graph. This is the same instance shown in Fig. 6(b).
FOURIER. We attribute this to the fact that the op-
timal parameters are smooth, and adding perturbations
in the (~ ~ )-space modify (~
u, v ~ in a correlated way,
γ , β)
which could enable the optimization to escape local op-
tima more easily. Hence, we consider here FOURIER
with perturbations, but not INTERP. Appendix C: Time-to-solution (TTS) for example
graph instance
Finally, we also consider variants of the FOURIER
strategy where the number of frequency components q
is fixed. These variants are the same as aforementioned
strategies where q = p, except we truncated each of In Sec. V B, we focused on a representative graph
the u ~ and v ~ parameters to keep only the first q com- instance where the adiabatic minimum gap is small
ponents. For example, when optimizing QAOA at level [Fig. 6(b)]. The low energy spectrum for the graph along
p ≥ q with the FOURIER[q, 0] strategy, we stop adding the QA path can be seen in Fig. 12(a). We remark that
higher frequency components and use the initial point only eigenstates that are invariant under the parity op-
QN
~ 0(p+1) = u
u ~L q
(p) ∈ R . erator P = i=1 σix are shown, since the Hamiltonian
HQA (s) commutes with P and the initial state is P-
invariant: P|ψ(0)i = |ψ(0)i.
Fig. 12(b) and Fig. 12(c) illustrate TTSQA and

TTSQAOA for the same graph. For QA, one can see
3. Comparison between heuristics that non-adiabatic evolution with T ≈ 20 yields orders
of magnitude shorter TTS than the adiabatic evolution.
We also see that TTS in the adiabatic limit is indepen-
We now examine the difference in the performance dent of the annealing time T , following
the Landau-Zener
among the various heuristic strategies we proposed. For formula pGS = 1 − exp −cT ∆2min . For QAOA, we use
our example instance in Fig. 11, we note that INTERP our FOURIER[∞, 10] heuristic strategy to perform the
and FOURIER[∞, 0] strategies have essentially identical numerical simulation up to pmax = 50, and compute
performance (except for small variations barely visible at TTSQAOA (p) and TTSopt QAOA (up to p ≤ pmax ). For this
large p). This is expected since both strategies generate particular graph, we note that although TTSopt QAOA occurs
initial points for optimizing level p + 1 based on smooth at p = 49, in practice it may be better to run QAOA at
deformation of the optimum at level p. In any case, p = 4 or p = 5 due to optimization overhead and error
the FOURIER[∞, 10] strategy outperforms INTERP and accumulation at deeper circuit depths. The apparent dis-
FOURIER[∞, 0] beginning at level p & 20. Interestingly, continuous jump in TTSQAOA is due to the corresponding
even when restricting the number of QAOA parameters jump in pGS (p), which can be explained by two reasons:
to 2q = 10, the FOURIER[5,10] strategy closely matches first, our heuristic strategy is not guaranteed to find the
the performance of other heuristics at low p and beats global optimum, and random perturbations may help the
the R = 0 heuristic at large p. This suggests that the algorithm escape a local optimum, resulting in a jump in
optimal QAOA parameters may in fact live in a small- ground state population; second, even when the global
dimensional manifold. Therefore, optimization for inter- optimum is found for all level p, there can still be discon-
mediate p-level QAOA can be dramatically simplified by tinuities in pGS , since the objective function of QAOA is
using our new parameterization (~ ~ ).
u, v energy instead of ground state population.
19
(a) 4 (a) 1
GS
3.5
-1 1E
instantaneous eigenstate
E - EGS
3 10
0.8 2E
1E
E - EGS
2.5 10-2
3E
2E 4E
2
population
3E -3 0.6
1.5 4E 10
5E 0.7 0.8 0.9 1
1
6E
0.5 0.4
7E
0
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
0.2
(b)
7 0
10
0.5 0.6 0.7 0.8 0.9 1
6
10
TTSQA
2
105 (b)
10
TTSQA
104
TTSopt
10 3 QA 101
Landau-Zener
100 101 102 103 104 105 106

0
10
(c) 1200
TTSQAOA
<latexit sha1_base64="zgnrvYOUpsvIxW07PeqD5QuG8Gc=">AAADrHicfVLbbtNAEN3GXIq5tfDIy4jIUnkgilMEPJabhIRARWraCtuq1ptJssp619pdN41c/wGPvMJ/8TdsHFfkQhnJ1uzMnDNnZyfNBTe22/291fJu3Lx1e/uOf/fe/QcPd3YfHRtVaIZ9poTSpyk1KLjEvuVW4GmukWapwJN08m6ePzlHbbiSR3aWY5LRkeRDzqh1oW+xxQtb7rFn1dlOu9vp1gabTtg4bdLY4dlu63s8UKzIUFomqDFR2M1tUlJtORNY+XFhMKdsQkcYOVfSDE1S1pIrCFxkAEOl3Sct1NFlREkzY2ZZ6iozasdmPTcP/isXFXb4Oim5zAuLki0aDQsBVsH8/jDgGpkVM+dQprnTCmxMNWXWTckPYLmPVLYek3E0n2egphKa1KqeNKt8PwjgrabPJ2jhyxUOgsCPJU6ZyjIqB2WcauoKqqiXlLGgciQQ2iFcQrsHsa7PrtV77uRA2pBdidggqqJwlcVB5xKuQdR9HeJyXvu3GXz6Xw9Wo/Y31brf/jLLquYpt2NQOWpq3QtzCSnaKeIaPz3H9TssMb5xa+uGC47gw0Xu3myh8Loxu/0N17d10znudULnf33RPnjZbPI2eUKekj0SklfkgHwkh6RPGJHkB/lJfnkd78iLvGRR2tpqMI/JinnDPy/zKr0=</latexit>
1000
TTSopt
QAOA 10-1
TTSQAOA
800
600
400 10-2
0.5 0.6 0.7 0.8 0.9 1
200
0
1 5 10 15 20 25 30 35 40 45 50 FIG. 13. (a) Instantaneous eigenstate populations along the
linear-ramp quantum annealing path given by Eq. (9) for the
example graph in Fig. 6(b). The annealing time is chosen to
FIG. 12. (a) Energy spectrum of low excited states (measured be T = T ∗ = 40, which corresponds to the time where the
from the ground state energy EGS ) along the annealing path diabatic bump occurs in Fig. 6(c). (b) Coupling between the
for the graph instance given in Fig. 6(b). Only states that instantaneous ground states and the first few excited states.
can couple to the ground state are shown,Q i.e., states that are The plotted quantities measure the degree of adiabaticity [as
invariant under the parity operator P = N x
i=1 σi . The inset explained in Eq. (D5)].
shows the energy gap from the ground state in logarithmic
scale. (b) Time-to-solution for the linear-ramp quantum an-
nealing protocol, TTSQA , for the same graph instance. TTS
in the long time limit follows a line predicted by the Landau-
Zener formula, which is independent of the annealing time T . hk |, the Schrodinger equation becomes
(c) TTS for QAOA at each iteration depth p. X
iȧk = k ak − i al hk |˙l i, (D3)
l6=k
Appendix D: Effective few-level understanding of where hk |˙k i = 0 is taken by absorbing the phase into
the diabatic bump the eigenvector |k i. Written in a matrix form, we have
ȧ0 0 −ih0 |˙1 i −ih0 |˙2 i · · · a0

    
In this Appendix, we elucidate the mechanism of the
 ȧ1  −ih1 |˙0 i ∆10 −ih1 |˙2 i · · ·  a1 
diabatic bump observed in Fig. 6(c) via an effective few- i
 ȧ2  = −ih2 |˙0 i −ih2 |˙1 i ∆20 · · ·  a2  ,
   
level dynamics. To study the intermediate dynamics dur- .. .. .. .. .. ..
ing quantum annealing, we can work in the basis of in- . . . . . .
stantaneous eigenstates |l (t)i, where (D4)
HQA (t)|l (t)i = l (t)|l (t)i. (D1) where we take the ground state energy 0 = 0 (by absorb-
ing it into the phase of the coefficients) and ∆i0 = i − 0
Expanding
P the time-evolved state in this basis as |ψ(t)i = is the instantaneous energy gap from the ith excited state
l a l (t)| l (t)i, the Schrodinger equation can be written to the ground state. The time evolution starts from the
as initial ground state with a0 = 1 and ai = 0 for i 6= 0, and
X X the adiabatic condition to prevent coupling to excited
i (ȧl |l i + al |˙l i) = l al |l i, (D2) states is
l l
dHQA dHQA
| | i | | i

where ~ = 1 and the time dependence in the notations |h0 |˙i i| h 0 dt i h0 ds i
= 2 = 2 1. (D5)
is dropped for convenience. Multiplying the equation by ∆i0 ∆i0 ∆i0 T
20
Bayesian Optimization Nelder Mead BFGS

0.5 0.5 0.5
FOURIER
INTERP
0.4 RI best of 50 0.4 0.4
RI avg of 50
0.3 0.3 0.3
0.2 0.2 0.2
0.1 0.1 0.1
0 0 0
0 5 10 15 0 5 10 15 0 5 10 15
FIG. 14. Comparison of three different optimization routines applied to 10 instances of 14-vertex w3R graphs. The initial
point of optimization is generated with either our heuristics (FOURIER or INTERP) or random initialization (RI). We plot
the fractional error, 1 − r, averaged over instances, for each optimization routine and each initialization strategy at each p.
The error bars are sample standard deviations from the 10 instances. For our heuristic strategy, the optimization starts with a
single initial point generated using our FOURIER[∞, 0] or INTERP heuristics as described in Appendix B. For the RI strategy,
we generate 50 random initial points uniformly in the parameter space, and optimize from each initial point; both the best and
the average of 50 RI runs are plotted.
The first equality can be derived from Eq. (D1). This pro- Appendix E: Comparing different classical
duces the standard adiabatic condition T = O(1/∆2min ). optimization routines
As we discussed in section V B, the minimum gap for
some graphs can be exceedingly small, so the adiabatic
In this appendix, we compare three different classical
limit is not practical. However, it may be possible to
routines that can be used to optimize QAOA parame-
choose an appropriate run time T , which breaks adia-
ters: Bayesian Optimization [37], Nelder-Mead [36], and
baticity, but is long enough such that only few excited
BFGS [33]. This comparison is done by a numerical
states are effectively involved in the dynamics. This is
experiment where we apply these optimization routines
the regime where the diabatic bump operates and one
to 10 instances of 14-vertex weighted 3-regular (w3R)
can understand the dynamics by truncating Eq. (D4) to
graphs. To compare them on equal footing, we termi-
the first few basis states.
nate each optimization run after a budget of 20p objec-
tive function evaluations is used. In the gradient-based
routine, BFGS, we include the cost of gradient estima-
tion via the finite-difference method into the budget of
20p objective function evaluations. For each routine, we
start at p = 1 and gradually increment p, and perform
optimization where the initial point is generated using
As an example, we plot in Fig. 13(a) the instantaneous either our heuristic strategies (FOURIER and INTERP)
eigenstate populations of the first few states. It is simu- or the standard strategy of random initialization (RI).
lated with the full Hilbert space, but effectively the same We use the versions of these optimization routines imple-
dynamics will be generated if the simulation is restricted mented in MATLAB R2017b as bayesopt, fminsearch,
to the first few basis states in Eq. (D4). Fig. 13(b) shows and fminunc, respectively. The objective function at
the strength of the couplings between the instantaneous each set of parameter is evaluated to floating point pre-
ground state and the low excited states. By comparing cision. The tolerance in both objective function value
Fig. 13(a) and Fig. 13(b), one can see that T = T ∗ = 40 and step size in parameter space, as well as the finite-
allows the time evolution to break the adiabatic con- difference-gradient step size, are chosen to be 0.01.
dition before the anticrossing: population leaks to the The result of our numerical experiment is plotted in
first excited state, which becomes the ground state af- Fig. 14. Similar to Fig. 3(a), we see that the average
ter the anticrossing. Thus, the time scale of T ∗ for the quality of local optimum found from 50 RI runs is much
diabatic bump represents a delicate balance between al- worse than the best, indicating the difficulty of optimiz-
lowing population to leak out of the ground state and ing in the QAOA parameter landscape without a good
suppressing excessive population leakage, which explains initial point. We also see, regardless of the classical rou-
why it happens at a certain range of time scale. tines chosen, one run of optimization from an initial point
21
generated from our heuristic strategies is generally bet- tively uses information from some given parameter point
ter than the best out of 50 runs from randomly generated (~ ~ to find a new parameter point (~
γ , β) γ0, β~ 0 ) that hope-
initial points. This indicates that our heuristic strategies fully produces a larger value of the objective function
work much better than RI and can be integrated with ~ 0 ) ≥ Fp (~
Fp (~γ 0 , β ~ In order for the algorithm to ter-
γ , β).
a variety of classical optimization routines. Moreover, minate, we need to set some stopping criteria. Here,
we find that Bayesian Optimization typically does not we specify two: First, we set an objective function tol-
do as well as Nelder-Mead or BFGS for larger p, which erance , such that if the change in objective function
is consistent with the folklore that this routine is better |F̄p,M 0 (~γ0, β ~ 0 ) − F̄p,M (~γ , β)|
~ ≤ , the algorithm termi-
suited for low-dimensional parameter space [37]. On the nates. We also set a step-tolerance δ, so that the algo-
other hand, it seems both Nelder-Mead and BFGS have rithm terminates if the new parameter point is very close
comparable performance, even when the cost of gradient to the previous one |~γ 0 −~γ |2 +|β ~ 0 −β|
~ 2 ≤ δ 2 . For gradient-
evaluation is taken into account. The slight difference we based optimization algorithms such as BFGS, we also
observe between the two in Fig. 14 is inconclusive and use δ as the increment size for estimating gradients via
can be attributed to sub-optimal choices of tolerance and the finite-difference method: ∂Fp /∂γi ' [F̄p,M 0 (γi + δ) −
step size parameters and the deliberately limited budget F̄p,M (γi )]/δ. In our simulations, we use the BFGS algo-
of objective function evaluations. rithm implemented as fminunc in the standard library of
MATLAB R2017b.
Using the approach described above, we simulate ex-
Appendix F: Details of Simulation with periments of optimizing QAOA with measurement pro-
Measurement Projection Noise jection noise for a few example instances, with various
choices of precision parameters (, ξ, δ) and initial points.
When running QAOA on actual quantum devices, the For the representative instance studied in Fig. 7, we set
objective function is evaluated by averaging over many = 0.1, ξ = 0.05, and δ = 0.01. In each run of the
measurement outcomes, and consequently its precision simulated experiment, we start with QAOA of level ei-
is limited by the so-called measurement projection noise ther p = 1 or p = 5, and optimize increasing levels
from quantum fluctuations. We account for this effect by of QAOA using our FOURIER[∞, 0] heuristic strategy.
performing full Monte-Carlo simulations of actual mea- The initial point of QAOA optimization is either ran-
surements, where the simulated quantum processor only domly selected (when starting at p = 1) or chosen based
outputs approximate values of the objective function ob- on an educated guess using optimal parameters from
tained by averaging M measurements: small-sized instances (at p = 1 and p = 5). Specifi-
cally, the educated guess for the initial points we use are
1 X
M u0 , v
(~ ~ 0 ) = (1.4849, 0.5409) at level p = 1, and
F̄p,M = C(zp,i ), (F1)
M i=1 ~ 0 = (1.9212, 0.2891, 0.1601, 0.0564, 0.0292),
u (F3)
0
where zp,i is a random variable corresponding to the ith ~ = (0.6055, −0.0178, 0.0431, −0.0061, 0.0141)
v (F4)
measurement outcome obtained by measuring |ψp (~ ~
γ , β)i at level p = 5. For each such run, we keep track of
in the computational basis, and C(z) is the classical the history of all the measurements, so that the largest
objective function. Note when M → ∞, we obtain cut Cuti found after the i-th measurement can be cal-
F̄p,M → Fp = hψp (~ ~
γ , β)|H C |ψp (~
~
γ , β)i. In the simula- culated. We repeat each experiment for 500 times with
tion, we achieve finite precision |F̄p,M − Fp | . ξ by sam- different pseudo-random number generation seeds, and
pling measurements until the cumulative standard error average over their histories.
of the mean falls below the target precision level ξ. In
other words, for each evaluation of Fp requested by the
classical optimizer, the number of measurements M per-
formed is set by the following criterion: Appendix G: Techniques to speed up numerical
simulation
v
u M
u 1 X
In this Appendix, we discuss a number of techniques
t [C(zp,i ) − F̄p,M ]2 ≤ ξ. (F2)
M (M − 1) i=1 we exploited to speed up the numerical simulation for
both QAOA and QA.
Roughly, we expect M ≈ Var(Fp )/ξ 2 . To address issues First, we make use of the symmetries present in the
with finite sample sizes, we also require that at least 10 Hamiltonian. For general graphs, the only symmetry op-
measurements be performed (M ≥ 10) for each objective erator that commutes with both HC and HB is the parity
QN
function evaluation. operator P = i=1 σix : [HC , P] = 0, [HB , P] = 0, and so
We now provide some details on the classical opti- does [HQA (s), P] = 0, where HQA (s) is the quantum an-
mization algorithm used to optimize QAOA parame- nealing Hamiltonian in Eq. (9). The parity operator has
ters. Generally, classical optimization algorithms itera- two eigenvalues, +1 and −1, each with half of the entire
22
Hilbert space. The initial state for both QAOA and QA where n̂ = |1ih1| = (1 − σ z )/2, and PIS is the projec-
is in the positive sector, i.e., P|+i⊗N = |+i⊗N . Thus, tion (or restriction) onto the subspace of independent set
any dynamics remain in the positive parity sector. We states span{|ψi : n̂i n̂j |ψi = 0 ∀(i, j) ∈ E}. The p-level
can rewrite HC and HB in the basis of the eigenvectors QAOA for MIS, first suggested by [2], involves the prepa-
of P, and reduce the Hilbert space from 2N to 2N −1 by ration of the following variational wavefunction
working in the positive parity sector.
p−1
For QA, dynamics involving the time-dependent ~ = e−iβp HQ
Y
Hamiltonian can be simulated by dividing the total sim- |ψp (~γ , β)i e−iγk HP e−iβk HQ |0i⊗N , (H2)
k=1
ulation time T into sufficiently small discrete time τ and
implement each time step sequentially. At each small where
step, one can evolve the state without forming the full !
evolution operator [71], either using Krylov subspace pro-
X
HQ = PIS σix PIS . (H3)
jection method [72] or a truncated Taylor series approx- i
imation [73]. In our simulation, we used a scaling and
squaring method with a truncated Taylor series approx- Similar to the case of MaxCut, here QAOA works
imation [73] as it appears to run slightly faster than the by repeatedly measuring the system in the compu-
Krylov subspace method for small time steps. tational basis to obtain an estimate of Gp (~ ~ =
γ , β)
For QAOA, the dynamics can be implemented in a hψp (~ ~ ~
γ , β)|HP |ψp (~γ , β)i, and using a classical computer
more efficient way due to the special form of the operators to search for the best variational parameters (~ γ∗, β ~ ∗ ) so
HC and PHB . wWe work in the standard σz basis. Thus, as to maximize Gp . We note the evolution can be imple-
HC = hi,ji 2ij (1 − σiz σjz ) can be written as a diagonal mented using the following physical Hamiltonian
matrix and the action of e−iγHC can be implemented as
vector operations. For HB , the time evolution operator
X X
MIS
Hphysical (t) = [∆(t)n̂i + Ω(t)σix ] + U n̂i n̂j . (H4)
can be simplified as i hi,ji
N N
Y x Y In the U |∆|, |Ω| limit, the system is constraint to
e−iβHB = e−iβσj = 1 cos β − iσjx sin β . (G1)

the manifold where n̂i n̂j = 0 for all edges hi, ji (since
j=1 j=1
the initial state is |0i⊗N ), and the QAOA circuit can be
performed by setting appropriate waveforms of ∆(t) and
Therefore, the action of e−iβHB can also be imple-
Ω(t). See Ref. [45] for an implementation scheme of MIS
mented as N sequential vector operations without ex-
with a platform of neutral atoms interacting via Rydberg
plicitly forming the sparse matrix HB , which both im-
excitations.
proves simulation speed and saves memory. In addition,
in the optimization of variational parameters, we calcu-
lated the gradient analytically, instead of using finite- After performing exhaustive search of QAOA param-
difference methods. Techniques similar to the gradi- eters using randomly initialized optimization for many
ent ascent pulse engineering (GRAPE) method [74] were instances of MIS, we discover a similar pattern. This
used, which reduces the cost of computing the gradient is illustrated in Fig. 15(a), where we see that the opti-
from O(p2 ) to O(p), for a p-level QAOA. Lastly, in our mal parameters at p = 3 tend to cluster in two visu-
FOURIER strategy, we need to calculate the gradient of ally distinct groups. For one of the groups, the smooth
the objective function with respect to the new param- curve underlying the parameters exhibit resemblance to
~ = AC v a quantum annealing protocol, using a time-dependent
eters (~ ~ ). Since ~
u, v γ = AS u~ and β ~ for some
Hamiltonian:
matrices AS and AC , their gradients are also related via
∇u~ = AS ∇~ γ and ∇v
~ = AS ∇ β
~. MIS
HQA (t) = fP (t)HP + fQ (t)HQ . (H5)
Appendix H: QAOA for Maximum Independent Set

For example, if we choose (fP , fQ ) such that fP (0) >
0 and fP (T ) < 0, and fQ (0) = fQ (T ) = 0, then we
can initialize the system in |0i⊗N , which is the ground
In this Appendix, we briefly illustrate the generality of state of HQAMIS
(t = 0), and evolve adiabatically to reach
our results by applying QAOA to another class of com- MIS
the ground state of HQA (t = T ) ∝ −HP (i.e. state
binatorial optimization problems called Maximum Inde-
encoding the MIS solution). The Hamiltonian HQ , which
pendent Set (MIS). Given a graph G = (V, E), the MIS
is turned on in the middle of the time evolution, induces
problem concerns finding the largest independent set—a
couplings between different independent-set basis states
subset of vertices where no two of which share an edge.
and opens a spectral gap between the ground state and
In other words, the problem Hamiltonian is
excited states. For concreteness of discussion, we focus
X
! on the following choice of annealing schedule:
HP = PIS n̂i PIS (H1)
i fPc (ξ) = 6(1 − 2ξ), fQ
c
(ξ) = sin2 (πξ), ξ(t) = t/T. (H6)
23
2 1
2 1.5
0.5
0 1
0
-2 p=3 0.5 0 0.2 0.4 0.6 0.8 1
1 2
0 1
eigenstate population
0.6 0.65 0.7 0.75
1.5 0.8
instantaneous
p=3 1 0.6
1
0.4
0.5 0.5
0.2
0 0
1 2 3 0
100 101 102 103 104 105 106 107 0 0.2 0.4 0.6 0.8 1
FIG. 15. (a) Pattern in the optimal QAOA parameters at level p = 3 for 20 random instances of Maximum Independent Set
(MIS). Each dashed line connects parameters for one particular graph instance. (b) The energy difference relative to the ground
state in the quantum annealing protocol of Eq. (H6) for an example 32-vertex MIS instance. The annealing protocol progresses
as the parameter ξ increases from 0 to 1. The minimum spectral gap between the ground state (GS) and the first excited state
(1E) is ∆min = 0.0012 at ξ = 0.6666. (c) Comparing performance of quantum annealing and QAOA on the example instance, in
terms the ground
P state population at the end of the quantum evolution. The equivalent evolution time for QAOA is calculated
via TQAOA = p−1
Pp
i=1 |γi | + i=1 |βi |. (d) The effective annealing schedule converted from optimized 25-level QAOA parameters
for the example MIS instance. (e) The population of the system in the instantaneous eigenstates, during the effective annealing
schedule that approximates the dynamics under 25-level QAOA. Here, we observe that the algorithm attempts to transport
the system to the fifth excited state, keeping it there before it undergoes a series of anti-crossings towards the ground state.
We further analyze the performance and mechanism procedure as in Sec. V B. This annealing path is visual-
of QAOA for MIS by focusing on example instances ized in Fig. 15(d), where we plot the effective ξeff (t) de-
that are difficult for adiabatic quantum annealing due fined by fPc (ξeff (t))/fQ
c
(ξeff (t)) = fPQAOA (t)/fQ
QAOA
(t) for
to small spectral gaps. In Fig. 15(b), we show the the QAOA-like schedule. We then monitor populations
level-crossing structure for such an example instance, in the instantaneous eigenstates during the evolution to
where the minimum spectral gap is ∆min = 0.0012. gain insights into the mechanism of QAOA. As shown in
The same instance is studied in Ref. [45]. To study Fig. 15(e), QAOA is able to learn to navigate a very com-
the performance of QAOA in deeper-depth circuits, we plicated level-crossing structure by a combination of adi-
use the interpolation-based heuristic strategy outlined in abatic and non-adiabatic operations: the system diabat-
Appendix B 1 to optimize QAOA parameters for this ically couples to the excited states, lingers to maximize
example instance starting at level p = 3. The per- population in the fifth excited state, and then exploits a
formance of QAOA and quantum annealing are then series of anti-crossings to return to the ground state. Our
compared in Fig. 15(c), where we see that QAOA is results here demonstrate that the non-adiabatic mecha-
able to obtain a much larger ground state population nisms observed in QAOA for MaxCut can play a signifi-
in much shorter time compared to the adiabatic time cant role in more general problems, such as difficult MIS
scale of 1/∆2min ≈ 106 . We then study the mechanism instances.
of QAOA by converting its parameters at level p = 25
to a smooth annealing path (fPQAOA , fQQAOA
) in a similar

Quantum Approximate Optimization Algorithm: Performance, Mechanism, and Implementation On Near-Term Devices

Uploaded by

Copyright:

Available Formats

Quantum Approximate Optimization Algorithm: Performance, Mechanism, and Implementation On Near-Term Devices

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Quantum Approximate Optimization Algorithm: Performance, Mechanism, and Implementation On Near-Term Devices

Uploaded by

Copyright:

Available Formats

Quantum Approximate Optimization Algorithm: Performance, Mechanism, and

Implementation on Near-Term Devices

I. INTRODUCTION by classical computers [10]. It is thus an appealing algo-

0.4 0.2 udR graphs creates redundancy since e−iπHC = 1 if d is

After performing the above numerical experiment for

B. Heuristic optimization strategy for large p 0.5

# randomly initialized (RI)

runs to match heuristic

LZ sible with non-adiabatic mechanisms when adiabatic QA

To understand the behavior of QAOA, we focus on

0.37 0.89 0.2

correlated with TTSoptQA : QAOA automatically finds a 0

stance, as we see in Fig. 6(f), can be understood mostly

VI. CONSIDERATIONS FOR EXPERIMENTAL

So far, we have focused on understanding the best the-

B. Implementation for large problem sizes Original graph G Vertex renumbering

highly advantageous: the hybrid nature of QAOA as well 0.1

as its short- and intermediate-depth circuit parametriza-

for i = 1, 2, . . . , p + 1. Here, we denote [~ γ ]i ≡ γi

We remark that while the INTERP strategy can also

Fig. 12(b) and Fig. 12(c) illustrate TTSQA and

100 101 102 103 104 105 106

ȧ0 0 −ih0 |˙1 i −ih0 |˙2 i · · · a0

Bayesian Optimization Nelder Mead BFGS

0.3 0.3 0.3

0.2 0.2 0.2

0.1 0.1 0.1

Appendix H: QAOA for Maximum Independent Set

You might also like

ȧ0 0 −ih0 |˙1 i −ih0 |˙2 i · · · a0