Sparks of Quantum Advantage and Rapid Retraining in Machine Learning

‭Sparks of Quantum Advantage and Rapid Retraining in Machine‬
‭Learning‬
‭William Troy‬‭1*‬
‭Abstract‬
‭ he advent of quantum computing holds the potential to revolutionize various fields by‬
T
‭solving complex problems more efficiently than classical computers. Despite this promise,‬
‭practical quantum advantage is hindered by current hardware limitations, notably the small‬
‭number of qubits and high noise levels. In this study, we leverage adiabatic quantum computers‬
‭to optimize Kolmogorov-Arnold Networks, a powerful neural network architecture for‬
‭representing complex functions with minimal parameters. By modifying the network to use‬
‭Bézier curves as the basis functions and formulating the optimization problem into a Quadratic‬
‭Unconstrained Binary Optimization problem, we create a fixed-sized solution space, independent‬
‭of the number of training samples. Our approach demonstrates sparks of quantum advantage‬
‭through faster training times compared to classical optimizers such as the Adam, Stochastic‬
‭Gradient Descent, Adaptive Gradient, and simulated annealing. Additionally, we introduce a‬
‭novel rapid retraining capability, enabling the network to be retrained with new data without‬
‭reprocessing old samples, thus enhancing learning efficiency in dynamic environments.‬
‭Experimental results on initial training of classification and regression tasks validate the efficacy‬
‭of our approach, showcasing significant speedups and comparable performance to classical‬
‭methods. While experiments on retraining demonstrate a sixty times speed up using adiabatic‬
‭quantum computing based optimization compared to that of the gradient descent based‬
‭optimizers, with theoretical models allowing this speed up to be even larger! Our findings‬
‭suggest that with further advancements in quantum hardware and algorithm optimization,‬
‭quantum-optimized machine learning models could have broad applications across various‬
‭domains, with initial focus on rapid retraining.‬
‭1.‬ ‭Introduction‬
‭ he advent of quantum computing (QC) promises to revolutionize various fields by‬
T
‭solving complex problems more efficiently than classical computers. This is possible as quantum‬
‭computers leverage the principles of superposition and entanglement to perform computations‬
‭that would be infeasible for classical computers, potentially offering exponential speedups for‬
‭certain types of problems‬‭1–3‬‭. For instance, Shor's‬‭algorithm for factoring large numbers can‬
‭theoretically break widely used cryptographic systems much faster than the best classical‬
‭algorithms‬‭4‬‭. Despite this promise, the practical realization‬‭of quantum advantage is hindered by‬
‭current limitations in quantum hardware, notably the relatively small number of qubits and high‬
‭noise levels on modern quantum processors. These limitations make it challenging to solve‬
‭large-scale problems and require highly optimized algorithms to make the most of the existing‬
‭1‬
‭Independent Researcher‬
‭*‬
‭ orresponding Author: [email protected]‬
C
q‭ uantum resources‬‭5‬‭. To date, quantum advantage has only been demonstrated a couple of times,‬
‭notably by Google’s Sycamore processor and in Gaussian boson sampling experiments‬‭6,7‬‭.‬
‭While these achievements mark significant milestones, quantum advantage in other‬
‭domains remains difficult to achieve. A particularly promising area of application for QC is‬
‭quantum machine learning (QML), which integrates quantum computing with machine learning‬
‭techniques to leverage quantum speedups for training and inference tasks‬‭8–10‬‭. Platforms like‬
‭TensorFlow and Qiskit have already started integrating quantum models, providing tools for‬
‭researchers to explore QML applications‬‭11,12‬‭. Despite‬‭these advancements, Google recently‬
‭highlighted a significant gap in the field at Google I/O 2024, stating that no one has yet‬
‭demonstrated a clear quantum advantage for machine learning on classical data‬‭11‬ ‭This gap‬
‭underscores the need for innovative approaches that can bridge this divide and showcase the‬
‭practical benefits of quantum computing in machine learning.‬
‭In this study, we address this challenge by leveraging adiabatic quantum computing to‬
‭optimize Bézier Kolmogorov-Arnold Networks (KANs). KANs represent a powerful neural‬
‭network architecture known for their unique ability to represent complex functions with a‬
‭relatively small number of parameters‬‭14,15‬‭. However,‬‭due to the complexities of the model the‬
‭optimization of KANs is currently quite slow compared to multi-layered perceptions. By‬
‭reformulating the optimization problem from that of a classical backpropagation algorithm to a‬
‭Quadratic Unconstrained Binary Optimization (QUBO) problem we are able to create a fixed‬
‭sized solution space that is independent of the number of training samples. Using D-Wave‬
‭quantum annealers as our adiabatic quantum computers we are then able to find the optimal‬
‭weights for this network where the number of qubits and run time on the quantum computer is‬
‭independent of the number of training samples.‬
‭Our approach demonstrates sparks of quantum advantage when testing on small models‬
‭which are able to fit within the currently limited number of qubits available. This demonstration‬
‭is done by achieving faster training times compared to classical optimizers. These classical‬
‭optimizers include one of the most popular optimization strategies in the Adam Optimizer‬‭16–20‬ ‭as‬
‭well as Stochastic Gradient Descent‬‭21‬ ‭(SGD), Adaptive‬‭Gradient‬‭22‬ ‭(AdaGrad), and the classical‬
‭parallel to quantum annealing in optimization by simulated annealing‬‭23,24‬‭. Moreover, we‬
‭introduce a novel rapid retraining capability, enabled by the ability to retrain with new data‬
‭without the need to reprocess old samples. This feature is particularly advantageous in dynamic‬
‭environments where data is continuously evolving, such as real-time analytics, due to the fact‬
‭that a quantum annealer can perform an entire optimization in tens of microseconds‬‭25‬‭.‬
‭2.‬ ‭Results‬
‭ e conducted a series of experiments to evaluate the performance of our Bézier-KANs‬
W
‭optimized using different methods: backpropagation with the continuous variable gradient‬
‭descent based optimizers via pytorch‬‭26‬‭, QUBO with‬‭simulated annealing, and QUBO with‬
‭quantum annealing. Our results cover five separate tasks which include two classification tasks‬
‭with 1-layer KANs and three regression tasks using multi-layer KANs. Additionally, we explore‬
t‭he rapid retraining capability of our KAN formulation and its impact on training speed. Tasks‬
‭chosen remain relatively simplistic due to the extremely limited number of quality qubits‬
‭available in current hardware.‬
‭2.1. Classification Tasks‬

‭ he classification tasks were chosen from common classification problem datasets‬
T
‭available in SciPy‬‭27‬ ‭with one of them also being an‬‭example dataset in the pykan library‬‭28‬‭. For‬
‭classification tasks we used 1-layer KANs of the shape [2,1], as this was the same structure used‬
‭in the pykan classification examples. The training time and performance of the different‬
‭optimization methods were evaluated based on accuracy, precision, recall, and F1-score, Figure‬
‭1. Despite the limited number of qubits used to create the discrete Bézier coefficients the‬
‭quantum and simulated annealing achieve comparable performance to that of the continuous‬
‭coefficient gradient descent based optimizers, Figures 1a and 1b. On top of this, the training time‬
‭of the quantum method is significantly faster than all other tested methods. With the quantum‬
‭method achieving training speeds over 2.5 times faster than the gradient descent based‬
‭optimizers and over 17 times faster than the simulated annealer!‬
‭Figure 1.‬‭The training and testing metrics of Bézier-KANs‬‭optimized using classical optimizers and quantum‬
‭annealing on (a) the circle dataset and (b) the moon dataset. (c) The respective training times for these different‬
‭optimization scenarios.‬
‭2.2. Regression Tasks‬

‭ or regression dataset 1, given by Eq. 11, and regression dataset 2, the spherical‬
F
‭harmonic function, we evaluated the performance of the different optimization methods based on‬
‭ SE and R-squared, Figure 2a-2d. Along with this in order to simulate an environment in which‬
M
‭we may want to retrain our model several times we also performed rapid retraining. To do this‬
‭we trained our initial model on both regression dataset 1 and 2. After which we then iteratively‬
‭introduced new data to our respective datasets and retrained the models, paying attention to the‬
‭time taken for each retraining and total time spent training the model, Figure 2e and 2f.‬
‭ igure 2.‬‭(a) MSE and (b) R-Squared values of Bézier-KANs‬‭trained on regression dataset 1 using classical‬
F
‭optimizers and hybrid quantum annealing. (c) MSE and (d) R-Squared values of Bézier-KANs trained on regression‬
‭dataset 2 using an Adam optimizer, simulated annealing, and pure quantum annealing. (e) Time taken for initial‬
‭training and retraining using our different optimizers on regression dataset 1 and (f) on regression dataset 2. The‬
‭initial training had 1,000,000 data points while each retraining added 100,000 new data points.‬
I‭ t should be mentioned that the hybrid quantum annealer used here finds solutions in‬
‭around three seconds, as opposed to the tens of microseconds of a pure quantum annealer. It was‬
‭necessary to use a hybrid quantum annealer for some of our regression tasks as they use a greater‬
‭number of variables and hybrid quantum annealers can more consistently solve for a larger‬
n‭ umber of variables than current pure quantum annealers cand. Despite this large time increase‬
‭introduced to the quantum solutions by using a hybrid annealer, our hybrid method is able to‬
‭perform all four rounds of iterative retraining in an eighth of the time taken to perform a single‬
‭round of training by the gradient descent based optimizers for regression dataset 1! On top of this‬
‭even the simulated quantum annealer dramatically out performs the gradient descent based‬
‭optimizers on these tasks, due to not having to reprocess already seen samples. This is even taken‬
‭a step further when looking at dataset 2 as the pure quantum annealer is able to perform all four‬
‭rounds of retraining in less than 2% of the time taken by the gradient descent based optimizers to‬
‭perform the same task!‬
‭Regression dataset 3, given by Eq. 12, involved a more complex KAN with a larger‬
‭number of binary variables, over 200. Due to this large number of binary variables current‬
‭quantum annealers struggled to encode the task well and converge, while the simulated annealer‬
‭managed to find a solution if allowed to run enough times. With this we varied the degree of one‬
‭of the Bézier curves on the bottom layer from one to two, leaving the rest of the network‬
‭untouched with the other bottom Bézier curve and top Bézier curve having degrees of two and‬
‭one, respectively, and performed 50 hybrid optimizations for both of our modified networks,‬
‭Figure 3.‬
‭Figure 3.‬‭Distribution of (a) MSE and (b) R-Squared‬‭values for 100 hybrid optimizations of a Bézier-KAN, where‬
‭the degree of a Bézier curve in the first layer was changed from 1 to 2.‬
‭ rom this it can be seen that against what one would expect if we increase the flexibility‬
F
‭of the model to learn by increasing the degrees it has access to it actually can negatively affect‬
‭the model by increasing the chance of finding a less optimal solution. Although this is‬
‭unexpected it is attributed to the current state of quantum annealers in which the number of‬
‭variables is limited while both noise and chain breaks are very prevalent. For comparison,‬
‭simulated annealing was able to find a solution with a MSE value of 0.0041 and R-Squared of‬
‭0.963 for the Bézier curve being modified having a degree of one.‬
‭3.‬ ‭Discussion‬
‭ espite the limitations in the number and quality of current qubits, our work‬
D
‭demonstrates that it is already possible to create machine learning frameworks with training‬
‭speeds that outperform current classical methods in specific scenarios while maintaining‬
c‭ omparable performance. This advantage is particularly evident when retraining networks with‬
‭additional data. With pure quantum annealing, hybrid quantum annealing, and simulated‬
‭annealing all performing retraining faster than the gradient descent based optimizers, with pure‬
‭quantum annealing having the fastest retraining times. This success is achieved by creating‬
‭fixed-sized solution spaces with states that can be saved and rapidly solved using an annealer.‬
‭We anticipate further improvements in rapid retraining speed as advancements in quantum‬
‭hardware allow pure quantum annealers to accurately solve more complex problems.‬
‭In theory we should be seeing even larger speedups. This is because all that is needed for‬
‭our quantum optimization is a forward pass of the network before quantum annealing. Given that‬
‭theoretical annealing runs in a fraction of a second regardless of the problem size, the main time‬
‭component of the quantum optimization framework is the singular forward pass. Comparing this‬
‭to classical back propagation algorithms that require both a forward and backward pass for each‬
‭training epoch, it is easy to see that we should be seeing much greater experimental speedups. As‬
‭our experiments require 50 to 150 epochs of classical training, even without including the time‬
‭complexity of gradient calculations in backwards passes, one would expect to see speedups on‬
‭the scale of the number of training epochs.‬
‭We attribute this discrepancy between theoretical speedups and what we see‬
‭experimentally to code optimization. Given that data preprocessing constitutes a vast majority of‬
‭the runtime in most of our experiments, supplementary material S1, we anticipate that further‬
‭optimization of our code will lead to experimental speedups closer to that of theoretical.‬
‭There is also significant room for improvement of this algorithm with additional research‬
‭focused on optimizing the quantum annealing process, reducing the number of auxiliary‬
‭variables, problem decomposition, implementing similar approaches on gate-based quantum‬
‭computers‬‭29‬‭, and exploring the continuous optimization‬‭space, which is known to be simpler than‬
‭that of the discrete space‬‭30‬‭. Perhaps the most promising‬‭area of all for advancing this‬
‭methodology will be all-to-all connected quantum annealers‬‭31,32‬‭.‬‭All-to-all connectivity in‬
‭quantum annealers promises to drastically reduce, if not eliminate, the need for auxiliary‬
‭variables when solving HUBO problems‬‭33‬‭. This would‬‭theoretically eliminate the scalability‬
‭issue of our algorithm caused by the increasing need for auxiliary variables as the number of‬
‭layers in a network increases, allowing for a one-to-one qubit to variable representation of the‬
‭entire system.‬
‭With these things in mind, we foresee this as just the first of a new class of‬
‭quantum-optimizable machine learning models that create fixed-sized solution spaces. Exploring‬
‭alternative KAN formulations using other spline functions, Taylor series expansions of spline‬
‭basis functions, other objective functions, or entirely different machine learning models could‬
‭lead to the development of more models that fit within this class.‬
‭As potential applications of this model type span various fields, particularly where rapid‬
‭retraining can have a large effect, including finance, healthcare, environmental monitoring, and‬
‭cybersecurity there is a large incentive to expand on and optimize our proposed model type.‬
‭Along with this, although we already demonstrate experimental speedups in the training time of‬
o‭ ur model compared to classical optimization, continued advancements in quantum computing‬
‭will naturally enhance the performance of our model. Allowing for a potential future in which‬
‭complex machine learning networks could theoretically be trained and retrained on quantum‬
‭computers at speeds dramatically faster than their classical counterparts are capable of.‬
‭4.‬ ‭Methods‬
‭ ANs are a neural network architecture designed to represent complex functions with a‬
K
‭minimal number of parameters. In order to achieve this representation, KANs build networks‬
‭that are composed of basis functions. The original KAN representation used B-spline basis‬
‭functions summed with weighted activation functions to manipulate edges of a network.‬
‭However, it has also been shown that many other types of basis functions including‬
‭Fourier-splines‬‭34‬ ‭and wavelets‬‭15‬ ‭can also work to‬‭create KANs. While we are able to use our end‬
‭formulation to optimize one layer KANs of many different basis functions including B-splines,‬
‭fourier-splines, and gaussian radial basis function splines, supplementary material S2, these‬
‭spline types are not able to fit into the QUBO framework when stacked. This is because they are‬
‭composed of either non-differentiable base functions when fully expanded or functions whose‬
‭variables are not able to be expressed in linear and quadratic terms alone. While this is not a‬
‭problem for traditional optimization methods, as we will see, it becomes a significant issue when‬
‭optimizing a multi-layered KAN using QUBO methods.‬
‭4.1. Quadratic Unconstrained Binary Optimization (QUBO) Problem‬

‭QUBO is a mathematical formulation used to represent optimization problems where the‬
‭objective is to find the minimum of energy a quadratic polynomial over binary variables. These‬
‭problem setups represent energy functions that are extremely well-suited for quantum annealers,‬
‭such as those developed by D-Wave, because they can exploit quantum parallelism, quantum‬
‭tunneling, and quantum entanglement to search for the global minimum more efficiently than‬
‭classical methods. In general for quantum annealers this energy function can be written as:‬
→ →‭𝑇‬ →
‭𝑓(‬ ‭𝑞‬‭)‬ = ‭𝑞‬ ‭𝑄‬‭𝑞‬ ‭(1)‬
→
( )‭𝑇‬
‭where‬‭𝑄‬‭is an upper matrix,‬‭𝑞‬ = ‭𝑞‭1‬ ‬,..., ‭𝑞‬‭𝑁‬ ‭, and‬‭𝑞‭𝑖‬‬ ‭represents binary variables. Given that‬
‭2‬
‭𝑞‭𝑖‬‬ ∈ [‭0,‬ ‭1]‬ ‭,‬‭𝑞‭𝑖‬‬ = ‭𝑞‬‭𝑖‭,‬ which leads us to‬‭the QUBO formulation of this energy function:‬
→ ‭𝑁‬ ‭𝑁‬
‭𝑓(‬ ‭𝑞‬‭)‬ = ∑ ‭𝑄‭𝑖‬‬,‭𝑖‬‭𝑞‭𝑖‬‬ + ∑ ‭𝑄‭𝑖‬‬,‭𝑗‬‭𝑞‬‭𝑖‬‭𝑞‭𝑗‬ ‬ ‭(2)‬
‭𝑖‬=‭1‬ ‭𝑖‬<‭𝑗‬
‭where‬‭𝑄‬‭𝑖‬,‭𝑖‬ ‭and‬‭𝑄‬‭𝑖‬,‭𝑗‬ ‭denote the diagonal‬‭and off-diagonal terms, respectively, of a QUBO matrix‬‭35‬‭.‬
‭ hese binary variables in our case represent the control points in our basis functions that‬
T
‭make up the KANs. These continuous control points can be made into binary variables using‬
‭multiple forms of binary discretization, commonly in the form radix 2:‬
‭𝑚‬
‭𝑙‬
‭𝑥‬‭𝑖‬ ≈ ∑ ‭2‬‭𝑞‬‭𝑖‬,‭𝑙‬ ‭(3)‬
‭𝑙=
‬ −‭𝑚‬
‬ ‬ + ‭1‬‭is the number of qubits used to represent‬‭𝑥‬‭36‬‭.‬

‭where‬‭𝑥‬‭𝑖‬ ‭is any positive real integer and‬‭2‭𝑚
+
‭From this one can see that this can be expanded to include negative numbers by creating a‬‭𝑞‬
−
‭and‬‭𝑞‬ ‭to represent positive and negative values‬‭respectively:‬
‭𝑚‬
‭𝑙‬ + ‭𝑙‬ −
‭𝑥‬‭𝑖‬ ≈ ∑ ‭2‬‭𝑞‬‭𝑖‬,‭𝑙‬ − ‭2‬‭𝑞‬‭𝑖‬,‭𝑙‬ ‭(4)‬
‭𝑙=
‬ −‭𝑚‬
‭ hile QUBO problems are ideal for quantum annealing, the formatting of a KAN‬
W
‭optimization problem into one presents several unique challenges. The first difficulty is that‬
‭variables of interest are required to be able to be expressed in only linear or quadratic terms. The‬
‭second difficulty lies in the need to fully expand the basis functions of a KAN into their‬
‭component terms. This expansion is crucial as the entire KAN can be thought of as one large‬
‭equation with each layer feeding into the next. This leads to a problem when functions used to‬
‭make a layer are discontinuous and reliant on one knowing the output of the previous layer.‬
‭Unlike traditional neural network optimization, where you know the current output of a layer, as‬
‭weights are iteratively adjusted to minimize the loss function via gradient descent, QUBO does‬
‭not initialize the weights before solving. As QUBO solves for all weights at the same time, on‬
‭what can be thought of as a black box, this approach eliminates the iterative modification of‬
‭variables which is so time consuming in backpropagation. Although, this QUBO formulation‬
‭brings in the extra requirements of variables being expressed in linear or quadratic terms and of‬
‭continuous differentiability across all functions involved.‬
‭4.2. Bézier Curves as a Solution‬

‭To address these issues presented by QUBO formulation, we utilize Bézier curves as the‬
‭basis functions for our KANs. Bézier curves are continuously differentiable in their expanded‬
‭form and allow variables to be expressed in terms of just coefficients and control points, making‬
‭them suitable for multi-layer optimization using QUBO methods. With a Bézier curve of degree‬
‭𝑛 is given by:‬
‭𝑛‬
‭𝐵‬(‭𝑡‬) = ∑
‭𝑖‬=‭0‬
( ) (‭1‬ − ‭𝑡)‬
‭‬
𝑛
‭𝑖‬
‭𝑛‬−‭𝑖‬ ‭𝑖‬
‭𝑡‬‭𝑃‭𝑖‬‬‭‬ ‭(5)‬
‭where‬‭𝑃‭𝑖‬‬ ‭are the control points and‬‭𝑡‬ ∈ [‭0,‬ ‭1]‬ ‭. To formulate this into a QUBO format the‬
c‭ ontinuous‬‭𝑃‬‭variables are then discretized using‬‭variations of radix 2 encoding. This makes the‬
‭theoretical number of logical qubits needed to solve the problem directly related to the number of‬
‭control points in a given problem and the number of binary variables used to represent each‬
‭control point.‬
‭4.3. Auxiliary Variables and HUBO Formulation‬
‭ rom this, one can see how this may lead to variable interactions beyond the quadratic‬
F
‭order, the maximum order of which a QUBO formulation can solve for, creating a Higher Order‬
‭Unconstrained Binary Optimization (HUBO) problem, supplementary material S3. While this‬
‭may seem like a major problem, one can easily get around it by introducing auxiliary variables‬
‭with a singular variable being used to represent 2 or more others. An auxiliary variable that‬
‭represents 2 other variables can be given by:‬
‭𝑉‬‭𝐴𝑈𝑋‬ = ‭𝑃‭1‬ ‭𝑃 ‬
‬ ‭2‬ ‭(6)‬
‭With this relation being enforced by adding penalty terms the the QUBO problem in the form of:‬
‭𝑃𝑒𝑛𝑎𝑙𝑡𝑦‬ = ‭𝑤‬ ‭𝑃‭1‬ ‭𝑃

‬ ‭2‬ (
‬ − ‭2‭𝑉
‬ ‬
‭𝐴𝑈𝑋‬
‭𝑃‬‭1‬ − ‭2‬‭𝑉‭𝐴‬ 𝑈𝑋‬‭𝑃‭2‬ ‬ + ‭3‭𝑉
‬ ‬‭𝐴𝑈𝑋‬ ) ‭(7)‬
‭ here‬‭𝑤‬‭is the penalty coefficient‬‭37‬‭, 15-25 times‬‭the max coefficient value in our QUBO matrix‬
w
‭in this study.‬
‭This makes the total number of logical qubits needed to optimize a network equal to the‬
‭number of control points multiplied by the number of binary variables used to represent each‬
‭control point plus the number of auxiliary variables:‬
‭𝑛𝑢𝑚‬‭𝑐𝑜𝑛𝑡𝑟𝑜𝑙‬‭_‭𝑣‬ 𝑎𝑟𝑠‬
‭𝑛𝑢𝑚‬‭𝑞𝑢𝑏𝑖𝑡𝑠‬ = ‭𝑛𝑢𝑚‬‭𝑎𝑢𝑥‬‭_‭𝑣‬ 𝑎𝑟𝑠‬ + ∑ ‭𝑛𝑢𝑚‬‭𝑏𝑖𝑛𝑎𝑟𝑦‬‭_‭𝑣‬ 𝑎𝑟𝑠‬,‭𝑖‬ ‭(8)‬

‭𝑖=
‬ ‭1‬
‭4.4. Formulation of The QUBO Objective Function‬

‭In order to create a QUBO optimization function one must define an objective function (‬
‭𝑂𝐵𝐽‬‭)‬‭to be optimized. In our case we used the mean‬‭squared error (MSE) as our machine‬
‭learning objective. MSE was chosen as unlike many other traditional machine learning objective‬
‭functions, it is fully continuous. Making our objective take the form of:‬
‭𝑛‬ ‭2‬
‭𝑂𝐵𝐽‬ =
‭‬
1
‭𝑛‬ ( )
∑ ‭𝑦‬‭𝑖‬ − ‭𝑦‬‭𝑖‬
‭𝑖‬=‭1‬
‭(9)‬
‭where‬‭𝑛‬‭in the number of training samples,‬‭𝑦‬‭𝑖‬ ‭is the desired output at sample‬‭𝑖‬‭, and‬‭𝑦‬‭𝑖‬ ‭is the‬
n‭ etwork output. This objective function is a perfectly valid one and is commonly used in‬
‭machine learning, however, if optimized to its fullest extent it can easily lead to overfitting of the‬
‭training data. Since we cannot check for overfitting between training iterations, as we can‬
‭classically, we re-formulate the objective function so that it pays attention to a validation dataset‬
‭on its own:‬
‭𝑛‬‭𝑡‬ ‭𝑛‭𝑣‬ ‬
‭2‬ ‭2‬
‭𝑂𝐵𝐽‬ =
‭‬
1
‭𝑛‬‭𝑡‬ (
∑ ‭𝑦‬‭𝑡‬ − ‭𝑦‬‭𝑡‬ +
‭𝑡‬=‭1‬
) λ
‭𝑛‭𝑣‬ ‬ (
∑ ‭𝑦‬‭𝑣‬ − ‭𝑦‬‭𝑣‬
‭𝑣‬=‭1‬
) ‭(10)‬
‭with‬‭𝑛‭𝑡‬ ‬ ‭and‬‭𝑛‭𝑣‬ ‬ ‭denoting the number of training and validation samples, respectively, and‬λ
‭representing the validation dataset weighting.‬
‭4.5. Rapid Retraining Capability‬

‭Given an initial dataset the expanded equation that represents the KAN is created and all‬
‭available data points are put through a forward pass, after which objective function 1 (‬‭𝑂𝐵𝐽‬‭1‭)‬ , is‬
s‭ aved and a quantum annealer is used to solve for the weights given the input dataset(s). When‬
‭new data becomes available, the expanded equation that represents the KAN and‬‭𝑂𝐵𝐽‬‭1‬ ‭is‬
r‭ eloaded. The new data is put through a forward pass and the results of this forward pass are‬
‭added to‬‭𝑂𝐵𝐽‬‭1‭,‬ creating‬‭𝑂𝐵𝐽‬‭2‬‭. Once this‬‭is done the new‬‭𝑂𝐵𝐽‬‭,‬‭𝑂𝐵𝐽‬‭2‬‭, is sent to a quantum‬
a‭ nnealer for optimization. With each trial of the optimizer taking tens of microseconds this‬
‭creates an extremely fast retraining strategy without needing to reprocess already seen samples.‬
‭Additionally this allows for the removal of samples if desired by simply putting then through a‬
‭forward pass and then subtracting them from the‬‭𝑂𝐵𝐽‬‭.‬
‭4.6. Experimental Setup‬

‭We conducted experiments using two classification datasets and three regression datasets,‬
‭each with varying sizes and complexities which could be solved given the limited number of‬
‭available qubits. For classification tasks these datasets consist of the circle and moon datasets‬
‭given by the popular SciPy package while for regression all tasks were selected from the original‬
‭KAN paper. Regression dataset 1 is given by Eq. 11, dataset 2 is given by the spherical harmonic‬
‭function with m and n set to 0 and 1, respectively, and dataset 3 is given by Eq. 12.‬
‭3‭𝑥
‬‬
‭𝑧‬ = ‭𝑒𝑥𝑝‬(‭𝑦)‬ +‭𝑒𝑥𝑝‬(−‭𝑦‬) ‭(11)‬
‭2‬ ‭2‬
‭𝑧‬ = ‭2‬ ‭1‬ + ‭𝑥‬ + ‭𝑦‬ ‭(12)‬
‭ he dataset inputs were normalized from 0 to 1 so that inputs are within the preferable‬
T
‭range of a Bézier curve. Gradient descent based optimization was done using pytorch with a‬
‭modified version of the original KAN library‬‭28‬‭. For‬‭all test cases involving the gradient descent‬
‭based optimizers, only results with the optimal learning rate and number of steps were reported‬
‭in the main text. The optimal gradient descent based learning rates were found by iteratively‬
‭testing different learning rates for each problem from 0.001 to 1.5, with the ‘best’ denoting the‬
‭learning rate that converged on the most optimal solution first. From this, the optimal number of‬
‭steps was found by stopping learning once the loss function of the best learning rate converged‬
‭for multiple training iterations.‬
‭For annealing, the objective problem formulation was done in c++ called from python‬
‭which then used the pyqubo library‬‭38,39‬ ‭to convert‬‭the‬‭𝑂𝐵𝐽‬‭to a QUBO matrix. Simulated‬
‭annealing was done using the D-Wave simulated annealing library while quantum annealing used‬
‭the D-Wave Advantage2 prototype 2.3 as a pure quantum annealer and hybrid binary quadratic‬
‭ odel version 2 as a hybrid quantum annealer. Although quantum annealing can be run in tens of‬
m
‭microseconds, the large amount of quantum noise currently present in quantum systems requires‬
‭hundreds to thousands of quantum annealing runs to be done per problem. This is reflected in our‬
‭experimental timings and makes the time taken by quantum annealing go from tens of‬
‭microseconds to around a tenth of a microsecond for our test cases. Finally, since all‬
‭optimization methods tested are highly amenable to parallel computing, this factor was‬
‭eliminated and all classical computations were done on a single thread of an Intel Xeon Scalable‬
‭Platinum 8173M Processor processor running at a 2 GHz clock speed.‬
‭ oftware Availability Statement:‬

S
‭Code used in this study is available at https://github.com/wtroy2/Quantum-KAN‬
‭ onflict of interest:‬
C
‭The authors declare that they have no conflict of interest.‬
‭References:‬
‭1.‬ ‭Georgopoulos, K. Quantum Computing:Understanding its Potential Applications.‬‭PECB Insights‬
‭https://insights.pecb.com/quantum-computing-understanding-potential-applications/ (2023).‬
‭2.‬ ‭Grover, L. K. & Sengupta, A. M. Classical analog of quantum search.‬‭Phys. Rev. A‬‭65‬‭, 032319 (2002).‬
‭3.‬ ‭A new quantum algorithm for classical mechanics with an exponential speedup.‬
‭http://research.google/blog/a-new-quantum-algorithm-for-classical-mechanics-with-an-exponential-spe‬
‭edup/.‬
‭4.‬ ‭Shor, P. W. Polynomial-Time Algorithms for Prime Factorization and Discrete Logarithms on a‬
‭Quantum Computer.‬‭SIAM J. Comput.‬‭26‬‭, 1484–1509 (1997).‬
‭5.‬ ‭Castelvecchi, D. The AI–quantum computing mash-up: will it revolutionize science?‬‭Nature‬‭(2024)‬
‭doi:10.1038/d41586-023-04007-0.‬
‭6.‬ ‭Arute, F.‬‭et al.‬‭Quantum supremacy using a programmable‬‭superconducting processor.‬‭Nature‬‭574‬‭,‬
‭505–510 (2019).‬
‭7.‬ ‭Madsen, L. S.‬‭et al.‬‭Quantum computational advantage‬‭with a programmable photonic processor.‬
‭Nature‬‭606‬‭, 75–81 (2022).‬
‭8.‬ ‭Date, P., Arthur, D. & Pusey-Nazzaro, L. QUBO formulations for training machine learning models.‬
‭Sci. Rep.‬‭11‬‭, 10029 (2021).‬

‭9.‬ ‭Huang, H.-Y.‬‭et al.‬‭Quantum advantage in learning from experiments.‬‭Science‬‭376‬‭, 1182–1186 (2022).‬
‭10.‬ ‭Havlíček, V.‬‭et al.‬‭Supervised learning with quantum-enhanced‬‭feature spaces.‬‭Nature‬‭567‬‭,‬
‭209–212 (2019).‬
‭11.‬‭Broughton, M.‬‭et al.‬‭TensorFlow Quantum: A Software‬‭Framework for Quantum Machine Learning.‬
‭Preprint at https://doi.org/10.48550/arXiv.2003.02989 (2021).‬
‭12.‬ ‭Javadi-Abhari, A.‬‭et al.‬‭Quantum computing with‬‭Qiskit. Preprint at‬
‭https://doi.org/10.48550/arXiv.2405.08810 (2024).‬
‭13.‬ ‭Chou, C. & Lucero, E. Quantum computing: Facts, fiction and the future. (2024).‬
‭14.‬ ‭Liu, Z.‬‭et al.‬‭KAN: Kolmogorov-Arnold Networks.‬‭Preprint at‬
‭15.‬ ‭Bozorgasl, Z. & Chen, H. Wav-KAN: Wavelet Kolmogorov-Arnold Networks.‬‭arXiv.org‬
‭https://arxiv.org/abs/2405.12832v2 (2024).‬
‭16.‬ ‭Kingma, D. P. & Ba, J. Adam: A Method for Stochastic Optimization. Preprint at‬
‭17.‬ ‭Yuan, W. & Gao, K.-X. EAdam Optimizer: How $\epsilon$ Impact Adam. Preprint at‬
‭18.‬ ‭Brantner, B. Generalizing Adam to Manifolds for Efficiently Training Transformers. Preprint at‬
‭19.‬ ‭Ahn, K., Zhang, Z., Kook, Y. & Dai, Y. Understanding Adam Optimizer via Online Learning of‬
‭Updates: Adam is FTRL in Disguise. Preprint at https://doi.org/10.48550/arXiv.2402.01567 (2024).‬
‭20.‬ ‭Gudla, S. P. K. & Bhoi, S. K. A Study on Effect of Learning Rates Using Adam Optimizer in‬
‭LSTM Deep Intelligent Model for Detection of DDoS Attack to Support Fog Based IoT Systems. in‬
‭Computing, Communication and Learning‬‭(eds. Panda,‬‭S. K. et al.) 27–38 (Springer Nature‬
‭Switzerland, Cham, 2022). doi:10.1007/978-3-031-21750-0_3.‬
‭21.‬ ‭Bottou, L. Large-Scale Machine Learning with Stochastic Gradient Descent. in‬‭Proceedings of‬
‭COMPSTAT’2010‬‭(eds. Lechevallier, Y. & Saporta, G.)‬‭177–186 (Physica-Verlag HD, Heidelberg,‬

‭2010). doi:10.1007/978-3-7908-2604-3_16.‬
‭22.‬ ‭Duchi, J., Hazan, E. & Singer, Y. Adaptive Subgradient Methods for Online Learning and‬
‭Stochastic Optimization.‬‭J Mach Learn Res‬‭12‬‭, 2121–2159‬‭(2011).‬
‭23.‬ ‭Kirkpatrick, S., Gelatt, C. D. & Vecchi, M. P. Optimization by Simulated Annealing.‬‭Science‬‭220‬‭,‬
‭671–680 (1983).‬
‭24.‬ ‭Bertsimas, D. & Tsitsiklis, J. Simulated Annealing.‬‭Stat. Sci.‬‭8‬‭, 10–15 (1993).‬
‭25.‬ ‭Pelofske, E. Comparing Three Generations of D-Wave Quantum Annealers for Minor Embedded‬
‭Combinatorial Optimization Problems. Preprint at https://doi.org/10.48550/arXiv.2301.03009 (2023).‬
‭26.‬ ‭Paszke, A.‬‭et al.‬‭PyTorch: An Imperative Style,‬‭High-Performance Deep Learning Library. in‬
‭Advances in Neural Information Processing Systems‬‭vol. 32 (Curran Associates, Inc., 2019).‬
‭27.‬ ‭Virtanen, P.‬‭et al.‬‭SciPy 1.0: fundamental algorithms‬‭for scientific computing in Python.‬‭Nat.‬
‭Methods‬‭17‬‭, 261–272 (2020).‬
‭28.‬ ‭KindXiaoming/pykan: Kolmogorov Arnold Networks. https://github.com/KindXiaoming/pykan.‬
‭29.‬ ‭Glos, A., Krawiec, A. & Zimborás, Z. Space-efficient binary optimization for variational quantum‬
‭computing.‬‭Npj Quantum Inf.‬‭8‭,‬ 1–8 (2022).‬
‭30.‬ ‭Liu, Z.‬‭et al.‬‭How Do Adam and Training Strategies‬‭Help BNNs Optimization. in‬‭Proceedings of‬
‭the 38th International Conference on Machine Learning‬‭6936–6946 (PMLR, 2021).‬
‭31.‬ ‭Lechner, W., Hauke, P. & Zoller, P. A quantum annealing architecture with all-to-all connectivity‬
‭from local interactions.‬‭Sci. Adv.‬‭1‭,‬ e1500838 (2015).‬
‭32.‬ ‭Pita-Vidal, M., Wesdorp, J. J. & Andersen, C. K. Blueprint for all-to-all connected‬
‭superconducting spin qubits. Preprint at https://doi.org/10.48550/arXiv.2405.09988 (2024).‬
‭33.‬ ‭Yarkoni, S., Raponi, E., Bäck, T. & Schmitt, S. Quantum annealing for industry applications:‬
‭introduction and review.‬‭Rep. Prog. Phys.‬‭85‬‭, 104001‬‭(2022).‬
‭34.‬ ‭Noesis, G. GistNoesis/FourierKAN. (2024).‬
‭35.‬ ‭Jun, K. QUBO formulations for a system of linear equations.‬‭Results Control Optim.‬‭14‬‭, 100380‬
‭(2024).‬
‭36.‬ ‭O’Malley, D. & Vesselinov, V. V. ToQ.jl: A high-level programming language for D-Wave‬
‭machines based on Julia. in‬‭2016 IEEE High Performance‬‭Extreme Computing Conference (HPEC)‬
‭1–7 (2016). doi:10.1109/HPEC.2016.7761616.‬
‭37.‬ ‭Jun, K. & Lee, H. HUBO formulations for solving the eigenvalue problem.‬‭Results Control‬
‭Optim.‬‭11‬‭, 100222 (2023).‬
‭38.‬ ‭Zaman, M., Tanahashi, K. & Tanaka, S. PyQUBO: Python Library for Mapping Combinatorial‬
‭Optimization Problems to QUBO Form.‬‭IEEE Trans. Comput.‬‭71‬‭, 838–850 (2022).‬
‭39.‬ ‭Tanahashi, K., Takayanagi, S., Motohashi, T. & Tanaka, S. Application of Ising Machines and a‬
‭Software Development for Ising Machines.‬‭J. Phys.‬‭Soc. Jpn.‬‭88‬‭, 061010 (2019).‬

‭Sparks of Quantum Advantage and Rapid Retraining in Machine‬
‭Learning Supplementary‬
‭S1:‬
‭ igure S1.‬‭The training times of the quantum algorithm‬‭split up by step within the algorithm. These steps being‬
F
‭processing, quantum annealing, and post processing. Preprocessing includes formulation of the objective function,‬
‭the forward pass, and variable separation. The datasets depicted include (a) the timing of algorithm steps in the‬
‭circle and moon datasets, used in Figure 1, and (b) regression dataset 1 from Figure 2 of the main text. As can be‬
‭seen as the problem gets more intricate the preprocessing time becomes a majority of the quantum algorithm. We‬
‭attribute this to our implementation of the code for formulating the objective function not being fully optimized.‬
‭S2:‬
‭ any types of functions work for a single-layer KAN network that is to be put in QUBO‬
M
‭format that do not work for a multi-layered KAN in QUBO format. This is due to the fact that for‬
‭a one layer network at the time of doing the forward pass we know the value of all inputs to each‬
‭basis function. Implementations of the B-spline, Fourier-spline, and Gaussian-Spline that are‬
‭quantum optimizable on a simulated annealer are shown in Figure S2.‬
‭ igure S2.‬‭Different spline functions trained to fit‬‭to‬‭𝑦‬ = ‭𝑠𝑖𝑛‬(‭𝑥‬) ‭by creating an MSE objective function‬‭converted‬
F
‭to QUBO format and solved using simulated annealing.‬
‭ his becomes a problem when we start to stack layers as the output of one layer becomes‬
T
‭the input to another. When this happens points we are solving for are either expressed as‬
‭conditionals or in terms of functions which do not allow them to be expressed as either linear or‬
‭quadratic, both of which are unacceptable for QUBO.‬
‭S3‬‭:‬
‭ iven that we are optimizing a 2-layer Bézier-KAN of shape [1,1,1] and we are using the‬
G
‭syntax‬‭𝐵‬‭𝑖‬,‭ℓ‬ ‭for our Bézier curves where‬‭𝑖‬‭is‬‭the index in the layer and‬‭ℓ‬‭is layer; we can define‬
‭the entirety of the KAN as:‬
( )
‭𝑧‬ = ‭𝐵‬‭1,‬‭2‬ ‭𝐵‭1‬ ‬,‭1‬ ‭‬ ‭(S1)‬
I‭ f we then say all of the Bézier curves in this KAN are of order 1 we can define each Bézier‬
‭curve as:‬
‭𝐵‬‭𝑖‬,‭ℓ‬(‭𝑡‬) = (‭1‬ − ‭𝑡)‭
‬ ‭𝑃
‬ ‭0‬ ‬,‭𝑖‬,‭ℓ‬ + ‭𝑡‬‭𝑃‭1‬ ‬,‭𝑖,‬‭ℓ‬‭‬, ‭‬‭0‬ ≤ ‭𝑡‬ ≤ ‭1‭‬‬ ‭(S2)‬
I‭ f we then discretize the continuous control points using only the integer representation of radix‬
‭2 with two qubits,‬‭𝑞‬‭, per control point we then‬‭end up with Eq. S3 for a given Bézier curve:‬
‭𝐵‬‭𝑖‬,‭ℓ‬(‭𝑡‬) = (‭1‬ − ‭𝑡)‭ ( ) ( )
‬ ‬ ‭𝑞‬‭0,‬‭0,‬‭𝑖‬,‭ℓ‬ + ‭2‬‭𝑞‭1‬ ‬,‭0,‬‭𝑖‬,‭ℓ‬ + ‭𝑡‭‬‬ ‭𝑞‬‭0,‬‭1‬,‭𝑖,‬‭ℓ‬ + ‭2‬‭𝑞‭1‬ ,‬‭1‬,‭𝑖‬,‭ℓ‬ ‭‬, ‭‬‭0‬ ≤ ‭𝑡‬ ≤ ‭1‬‭‬ ‭(S3)‬
‭Substituting Eq S3 into Eq. S1 we get:‬

(
‭𝑧‬ = ‭1‬ − (‭1‬ − ‭𝑡)‭ ( ) (
‬ ‬ ‭𝑞‬‭0,‬‭0,‬‭0,‬‭0‬ + ‭2‬‭𝑞‭1‬ ‬,‭0,‬‭0‬,‭0‬ + ‭𝑡‭‬‬ ‭𝑞‬‭0,‬‭1‬,‭0,‬‭0‬ + ‭2‬‭𝑞‬‭1,‬‭1‬,‭0,‬‭0‬ ‭‬ ‭𝑞‭0‬ ,‬‭0‬,‭0,‬‭1‬ + ‭2‭𝑞 )) (
‬ ‬‭1,‬‭0,‬‭0,‬‭1‬ )
‭(S4)‬
+ (‭1‬ − ‭𝑡)‭ (
‬ ‬ ‭𝑞‭0‬ ‬,‭0‬,‭0‬,‭0‬ + ‭2‭𝑞 ) (
‬ ‬‭1,‬‭0,‬‭0‬,‭0‬ + ‭𝑡‬‭‬ ‭𝑞‭0‬ ‬,‭1,‬‭0‬,‭0‬ + ‭2‭𝑞
‬ ‬‭1,‬‭1‬,‭0,‬‭0‬ ‭‬ ‭𝑞‭0‬ ,‬‭1‬,‭0,‬‭1‬ + ‭2‭𝑞 )(
‬ ‬‭1‬,‭1,‬‭0‬,‭1‬ ‭‬ )
‭Which when multiplied out goes to:‬
‭𝑧‬ = ‭𝑞‬‭0,‬‭0,‬‭0‬,‭1‬ + ‭2‭𝑞
‬ ‭1‬ ‬,‭0‬,‭0,‬‭1‬
− ‭𝑞‬‭0,‬‭0,‬‭0,‬‭1‬‭𝑞‬‭0‬,‭0‬,‭0‬,‭0‬ − ‭2‭𝑞
‬ ‬‭0,‬‭0,‬‭0,‬‭1‭𝑞 ‬
‬ ‭1‬,‭0‬,‭0,‬‭0‬
− ‭2‭𝑞
‬ ‬‭1‬,‭0‬,‭0,‬‭1‬‭𝑞‭0‬ ,‬‭0‬,‭0,‬‭0‬ − ‭4‭𝑞
‬ ‬‭1‬,‭0,‬‭0‬,‭1‭𝑞 ‬
‬ ‭1‬,‭0,‬‭0‬,‭0‬
+ ‭𝑡𝑞‬‭0,‬‭0,‬‭0,‬‭1‭𝑞 ‬
‬ ‭0,‬‭0,‬‭0,‬‭0‬
+ ‭2‭𝑡‬ ‬‭𝑞‬‭0,‬‭0,‬‭0,‬‭1‬‭𝑞‬‭1,‬‭0‬,‭0‬,‭0‬ + ‭2‭𝑡‬ ‬‭𝑞‭1‬ ‬,‭0,‬‭0‬,‭1‭𝑞 ‬
‬ ‭0‬,‭0,‬‭0‬,‭0‬
+ ‭4‬‭𝑡‬‭𝑞‬‭1,‬‭0‬,‭0,‬‭1‬‭𝑞‬‭1,‬‭0‬,‭0,‬‭0‬
− ‭𝑡𝑞‬‭0,‬‭0,‬‭0,‬‭1‭𝑞 ‬
‬ ‭0,‬‭1,‬‭0,‬‭0‬
− ‭2‭𝑡‬ ‬‭𝑞‬‭0,‬‭0,‬‭0,‬‭1‬‭𝑞‬‭1,‬‭1‬,‭0‬,‭0‬ − ‭2‭𝑡‬ ‬‭𝑞‭1‬ ‬,‭0,‬‭0‬,‭1‭𝑞 ‬
‬ ‭0‬,‭1,‬‭0‬,‭0‬
− ‭4‬‭𝑡‬‭𝑞‬‭1,‬‭0‬,‭0,‬‭1‬‭𝑞‬‭1,‬‭1‬,‭0,‬‭0‬ ‭(S5)‬
+ ‭𝑞‭0‬ ‬,‭1,‬‭0,‬‭1‭𝑞 ‬
‬ ‭0,‬‭0,‬‭0,‬‭0‬
+ ‭2‬‭𝑞‬‭0,‬‭1,‬‭0,‬‭1‬‭𝑞‬‭1,‬‭0‬,‭0‬,‭0‬ + ‭2‭𝑞
‬ ‬‭1,‬‭1,‬‭0‬,‭1‭𝑞 ‬
‬ ‭0‬,‭0,‬‭0‬,‭0‬
+ ‭4‬‭𝑞‬‭1,‬‭1‬,‭0,‬‭1‬‭𝑞‬‭1,‬‭0‬,‭0,‬‭0‬
− ‭𝑡𝑞‬‭0,‬‭1,‬‭0,‬‭1‭𝑞 ‬
‬ ‭0,‬‭0,‬‭0,‬‭0‬
− ‭2‭𝑡‬ ‬‭𝑞‬‭0,‬‭1,‬‭0,‬‭1‬‭𝑞‬‭1,‬‭0‬,‭0‬,‭0‬ − ‭2‭𝑡‬ ‬‭𝑞‭1‬ ‬,‭1,‬‭0‬,‭1‭𝑞 ‬
‬ ‭0‬,‭0,‬‭0‬,‭0‬
− ‭4‬‭𝑡‬‭𝑞‬‭1,‬‭1‬,‭0,‬‭1‬‭𝑞‬‭1,‬‭0‬,‭0,‬‭0‬
‭+‭𝑡‬ 𝑞‬‭0‬,‭1‬,‭0‬,‭1‬‭𝑞‭0‬ ‬,‭1‬,‭0,‬‭0‬ + ‭2‭𝑡‬ ‬‭𝑞‬‭0,‬‭1,‬‭0,‬‭1‭𝑞 ‬
‬ ‭1‬,‭1,‬‭0‬,‭0‬
+ ‭2‬‭𝑡‬‭𝑞‭1‬ ‬,‭1,‬‭0‬,‭1‭𝑞 ‬
‬ ‭0‬,‭1,‬‭0‬,‭0‬
+ ‭4‭𝑡‬ ‬‭𝑞‬‭1,‬‭1‬,‭0,‬‭1‬‭𝑞‭1‬ ,‬‭1‬,‭0,‬‭0‬
‭ ow considering our objective function is MSE as defined in Eq. 8 we see that we must‬
N
‭square Eq. S5. As this will quickly blow up into a much larger equation we will simplify it for‬
‭this example and just take the last two terms in Eq. S5 and square them to see that the problem‬
‭quickly becomes a higher order than quadratic with more than two‬‭𝑞‬‭variables being multiplied‬
‭together. Keeping in mind a binary variable squared is equal to the original binary variable:‬
(‭2‭𝑡‬ ‬‭𝑞‬‭1,‬‭1‬,‭0‬,‭1‬‭𝑞‭0‬ ‬,‭1‬,‭0,‬‭0‬ + ‭4‭𝑡‬ ‬‭𝑞‬‭1,‬‭1,‬‭0,‬‭1‭𝑞‬ ‬‭1‬,‭1,‬‭0‬,‭0‬)‭2‬ = ‭‬

‭2‬ ‭2‬ ‭2‬ ‭(S6)‬
‭‬‭‬‭‬‭‬‭‬‭‭‬‬‭4‭𝑡‬ ‬ ‭𝑞‭1‬ ‬,‭1‬,‭0‬,‭1‬‭𝑞‭0‬ ‬,‭1,‬‭0,‬‭0‬ + ‭16‬‭𝑡‬ ‭𝑞‬‭1‬,‭1,‬‭0‬,‭1‬‭𝑞‬‭0,‬‭1‬,‭0‬,‭0‬‭𝑞‬‭1,‬‭1‬,‭0,‬‭0‬ + ‭8‬‭𝑡‬ ‭𝑞‬‭1,‬‭1‬,‭0,‬‭1‬‭𝑞‭1‬ ,‬‭1‬,‭0,‬‭0‬
‭ ven in this simplistic example where we only have three unique binary variables in Eq.‬
E
‭S6, since everything is squared by the MSE objective function, variables become intertwined. In‬
‭which auxiliary variables become necessary to bring the problem back to quadratic. This‬
‭problem only becomes more pronounced when using higher-order Bézier curves within the‬
‭KAN. As everything being input into each Bézier curve is raised to the degree of that curve,‬
‭providing more opportunities to create a higher-order problem.‬

Sparks of Quantum Advantage and Rapid Retraining in Machine Learning

Uploaded by

Copyright:

Available Formats

Sparks of Quantum Advantage and Rapid Retraining in Machine Learning

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Sparks of Quantum Advantage and Rapid Retraining in Machine Learning

Uploaded by

Copyright:

Available Formats

‭Sparks of Quantum Advantage and Rapid Retraining in Machine‬

‭2.1. Classification Tasks‬

‭2.2. Regression Tasks‬

‭4.1. Quadratic Unconstrained Binary Optimization (QUBO) Problem‬

‬ ‬ + ‭1‬‭is the number of qubits used to represent‬‭𝑥‬‭36‬‭.‬

‭4.2. Bézier Curves as a Solution‬

‭𝑃𝑒𝑛𝑎𝑙𝑡𝑦‬ = ‭𝑤‬ ‭𝑃‭1‬ ‭𝑃

‭𝑛𝑢𝑚‬‭𝑞𝑢𝑏𝑖𝑡𝑠‬ = ‭𝑛𝑢𝑚‬‭𝑎𝑢𝑥‬‭_‭𝑣‬ 𝑎𝑟𝑠‬ + ∑ ‭𝑛𝑢𝑚‬‭𝑏𝑖𝑛𝑎𝑟𝑦‬‭_‭𝑣‬ 𝑎𝑟𝑠‬,‭𝑖‬ ‭(8)‬

‭4.4. Formulation of The QUBO Objective Function‬

‭4.5. Rapid Retraining Capability‬

‭4.6. Experimental Setup‬

‭ oftware Availability Statement:‬

‭Quantum Computer.‬‭SIAM J. Comput.‬‭26‬‭, 1484–1509 (1997).‬

‭5.‬ ‭Castelvecchi, D. The AI–quantum computing mash-up: will it revolutionize science?‬‭Nature‬‭(2024)‬

‭6.‬ ‭Arute, F.‬‭et al.‬‭Quantum supremacy using a programmable‬‭superconducting processor.‬‭Nature‬‭574‬‭,‬

‭7.‬ ‭Madsen, L. S.‬‭et al.‬‭Quantum computational advantage‬‭with a programmable photonic processor.‬

‭Nature‬‭606‬‭, 75–81 (2022).‬

‭Sci. Rep.‬‭11‬‭, 10029 (2021).‬

‭10.‬ ‭Havlíček, V.‬‭et al.‬‭Supervised learning with quantum-enhanced‬‭feature spaces.‬‭Nature‬‭567‬‭,‬

‭11.‬‭Broughton, M.‬‭et al.‬‭TensorFlow Quantum: A Software‬‭Framework for Quantum Machine Learning.‬

‭Preprint at https://doi.org/10.48550/arXiv.2003.02989 (2021).‬

‭12.‬ ‭Javadi-Abhari, A.‬‭et al.‬‭Quantum computing with‬‭Qiskit. Preprint at‬

‭14.‬ ‭Liu, Z.‬‭et al.‬‭KAN: Kolmogorov-Arnold Networks.‬‭Preprint at‬

‭15.‬ ‭Bozorgasl, Z. & Chen, H. Wav-KAN: Wavelet Kolmogorov-Arnold Networks.‬‭arXiv.org‬

‭Updates: Adam is FTRL in Disguise. Preprint at https://doi.org/10.48550/arXiv.2402.01567 (2024).‬

‭Computing, Communication and Learning‬‭(eds. Panda,‬‭S. K. et al.) 27–38 (Springer Nature‬

‭Switzerland, Cham, 2022). doi:10.1007/978-3-031-21750-0_3.‬

‭COMPSTAT’2010‬‭(eds. Lechevallier, Y. & Saporta, G.)‬‭177–186 (Physica-Verlag HD, Heidelberg,‬

‭Stochastic Optimization.‬‭J Mach Learn Res‬‭12‬‭, 2121–2159‬‭(2011).‬

‭23.‬ ‭Kirkpatrick, S., Gelatt, C. D. & Vecchi, M. P. Optimization by Simulated Annealing.‬‭Science‬‭220‬‭,‬

‭24.‬ ‭Bertsimas, D. & Tsitsiklis, J. Simulated Annealing.‬‭Stat. Sci.‬‭8‬‭, 10–15 (1993).‬

‭Combinatorial Optimization Problems. Preprint at https://doi.org/10.48550/arXiv.2301.03009 (2023).‬

‭Advances in Neural Information Processing Systems‬‭vol. 32 (Curran Associates, Inc., 2019).‬

‭Methods‬‭17‬‭, 261–272 (2020).‬

‭28.‬ ‭KindXiaoming/pykan: Kolmogorov Arnold Networks. https://github.com/KindXiaoming/pykan.‬

‭computing.‬‭Npj Quantum Inf.‬‭8‭,‬ 1–8 (2022).‬

‭the 38th International Conference on Machine Learning‬‭6936–6946 (PMLR, 2021).‬

‭from local interactions.‬‭Sci. Adv.‬‭1‭,‬ e1500838 (2015).‬

‭superconducting spin qubits. Preprint at https://doi.org/10.48550/arXiv.2405.09988 (2024).‬

‭introduction and review.‬‭Rep. Prog. Phys.‬‭85‬‭, 104001‬‭(2022).‬

‭34.‬ ‭Noesis, G. GistNoesis/FourierKAN. (2024).‬

‭1–7 (2016). doi:10.1109/HPEC.2016.7761616.‬

‭Optim.‬‭11‬‭, 100222 (2023).‬

‭Optimization Problems to QUBO Form.‬‭IEEE Trans. Comput.‬‭71‬‭, 838–850 (2022).‬

‭Software Development for Ising Machines.‬‭J. Phys.‬‭Soc. Jpn.‬‭88‬‭, 061010 (2019).‬

‭Substituting Eq S3 into Eq. S1 we get:‬

(‭2‭𝑡‬ ‬‭𝑞‬‭1,‬‭1‬,‭0‬,‭1‬‭𝑞‭0‬ ‬,‭1‬,‭0,‬‭0‬ + ‭4‭𝑡‬ ‬‭𝑞‬‭1,‬‭1,‬‭0,‬‭1‭𝑞‬ ‬‭1‬,‭1,‬‭0‬,‭0‬)‭2‬ = ‭‬

You might also like