Sparks of Quantum Advantage and Rapid Retraining in Machine Learning

Download as pdf or txt
Download as pdf or txt
You are on page 1of 17

‭Sparks of Quantum Advantage and Rapid Retraining in Machine‬

‭Learning‬
‭William Troy‬‭1*‬

‭Abstract‬
‭ he advent of quantum computing holds the potential to revolutionize various fields by‬
T
‭solving complex problems more efficiently than classical computers. Despite this promise,‬
‭practical quantum advantage is hindered by current hardware limitations, notably the small‬
‭number of qubits and high noise levels. In this study, we leverage adiabatic quantum computers‬
‭to optimize Kolmogorov-Arnold Networks, a powerful neural network architecture for‬
‭representing complex functions with minimal parameters. By modifying the network to use‬
‭Bézier curves as the basis functions and formulating the optimization problem into a Quadratic‬
‭Unconstrained Binary Optimization problem, we create a fixed-sized solution space, independent‬
‭of the number of training samples. Our approach demonstrates sparks of quantum advantage‬
‭through faster training times compared to classical optimizers such as the Adam, Stochastic‬
‭Gradient Descent, Adaptive Gradient, and simulated annealing. Additionally, we introduce a‬
‭novel rapid retraining capability, enabling the network to be retrained with new data without‬
‭reprocessing old samples, thus enhancing learning efficiency in dynamic environments.‬
‭Experimental results on initial training of classification and regression tasks validate the efficacy‬
‭of our approach, showcasing significant speedups and comparable performance to classical‬
‭methods. While experiments on retraining demonstrate a sixty times speed up using adiabatic‬
‭quantum computing based optimization compared to that of the gradient descent based‬
‭optimizers, with theoretical models allowing this speed up to be even larger! Our findings‬
‭suggest that with further advancements in quantum hardware and algorithm optimization,‬
‭quantum-optimized machine learning models could have broad applications across various‬
‭domains, with initial focus on rapid retraining.‬

‭1.‬ ‭Introduction‬
‭ he advent of quantum computing (QC) promises to revolutionize various fields by‬
T
‭solving complex problems more efficiently than classical computers. This is possible as quantum‬
‭computers leverage the principles of superposition and entanglement to perform computations‬
‭that would be infeasible for classical computers, potentially offering exponential speedups for‬
‭certain types of problems‬‭1–3‬‭. For instance, Shor's‬‭algorithm for factoring large numbers can‬
‭theoretically break widely used cryptographic systems much faster than the best classical‬
‭algorithms‬‭4‬‭. Despite this promise, the practical realization‬‭of quantum advantage is hindered by‬
‭current limitations in quantum hardware, notably the relatively small number of qubits and high‬
‭noise levels on modern quantum processors. These limitations make it challenging to solve‬
‭large-scale problems and require highly optimized algorithms to make the most of the existing‬

‭1‬
‭Independent Researcher‬
‭*‬
‭ orresponding Author: [email protected]
C
q‭ uantum resources‬‭5‬‭. To date, quantum advantage has only been demonstrated a couple of times,‬
‭notably by Google’s Sycamore processor and in Gaussian boson sampling experiments‬‭6,7‬‭.‬
‭While these achievements mark significant milestones, quantum advantage in other‬
‭domains remains difficult to achieve. A particularly promising area of application for QC is‬
‭quantum machine learning (QML), which integrates quantum computing with machine learning‬
‭techniques to leverage quantum speedups for training and inference tasks‬‭8–10‬‭. Platforms like‬
‭TensorFlow and Qiskit have already started integrating quantum models, providing tools for‬
‭researchers to explore QML applications‬‭11,12‬‭. Despite‬‭these advancements, Google recently‬
‭highlighted a significant gap in the field at Google I/O 2024, stating that no one has yet‬
‭demonstrated a clear quantum advantage for machine learning on classical data‬‭11‬ ‭This gap‬
‭underscores the need for innovative approaches that can bridge this divide and showcase the‬
‭practical benefits of quantum computing in machine learning.‬
‭In this study, we address this challenge by leveraging adiabatic quantum computing to‬
‭optimize Bézier Kolmogorov-Arnold Networks (KANs). KANs represent a powerful neural‬
‭network architecture known for their unique ability to represent complex functions with a‬
‭relatively small number of parameters‬‭14,15‬‭. However,‬‭due to the complexities of the model the‬
‭optimization of KANs is currently quite slow compared to multi-layered perceptions. By‬
‭reformulating the optimization problem from that of a classical backpropagation algorithm to a‬
‭Quadratic Unconstrained Binary Optimization (QUBO) problem we are able to create a fixed‬
‭sized solution space that is independent of the number of training samples. Using D-Wave‬
‭quantum annealers as our adiabatic quantum computers we are then able to find the optimal‬
‭weights for this network where the number of qubits and run time on the quantum computer is‬
‭independent of the number of training samples.‬
‭Our approach demonstrates sparks of quantum advantage when testing on small models‬
‭which are able to fit within the currently limited number of qubits available. This demonstration‬
‭is done by achieving faster training times compared to classical optimizers. These classical‬
‭optimizers include one of the most popular optimization strategies in the Adam Optimizer‬‭16–20‬ ‭as‬
‭well as Stochastic Gradient Descent‬‭21‬ ‭(SGD), Adaptive‬‭Gradient‬‭22‬ ‭(AdaGrad), and the classical‬
‭parallel to quantum annealing in optimization by simulated annealing‬‭23,24‬‭. Moreover, we‬
‭introduce a novel rapid retraining capability, enabled by the ability to retrain with new data‬
‭without the need to reprocess old samples. This feature is particularly advantageous in dynamic‬
‭environments where data is continuously evolving, such as real-time analytics, due to the fact‬
‭that a quantum annealer can perform an entire optimization in tens of microseconds‬‭25‬‭.‬

‭2.‬ ‭Results‬
‭ e conducted a series of experiments to evaluate the performance of our Bézier-KANs‬
W
‭optimized using different methods: backpropagation with the continuous variable gradient‬
‭descent based optimizers via pytorch‬‭26‬‭, QUBO with‬‭simulated annealing, and QUBO with‬
‭quantum annealing. Our results cover five separate tasks which include two classification tasks‬
‭with 1-layer KANs and three regression tasks using multi-layer KANs. Additionally, we explore‬
t‭he rapid retraining capability of our KAN formulation and its impact on training speed. Tasks‬
‭chosen remain relatively simplistic due to the extremely limited number of quality qubits‬
‭available in current hardware.‬

‭2.1. Classification Tasks‬


‭ he classification tasks were chosen from common classification problem datasets‬
T
‭available in SciPy‬‭27‬ ‭with one of them also being an‬‭example dataset in the pykan library‬‭28‬‭. For‬
‭classification tasks we used 1-layer KANs of the shape [2,1], as this was the same structure used‬
‭in the pykan classification examples. The training time and performance of the different‬
‭optimization methods were evaluated based on accuracy, precision, recall, and F1-score, Figure‬
‭1. Despite the limited number of qubits used to create the discrete Bézier coefficients the‬
‭quantum and simulated annealing achieve comparable performance to that of the continuous‬
‭coefficient gradient descent based optimizers, Figures 1a and 1b. On top of this, the training time‬
‭of the quantum method is significantly faster than all other tested methods. With the quantum‬
‭method achieving training speeds over 2.5 times faster than the gradient descent based‬
‭optimizers and over 17 times faster than the simulated annealer!‬

‭Figure 1.‬‭The training and testing metrics of Bézier-KANs‬‭optimized using classical optimizers and quantum‬
‭annealing on (a) the circle dataset and (b) the moon dataset. (c) The respective training times for these different‬
‭optimization scenarios.‬

‭2.2. Regression Tasks‬


‭ or regression dataset 1, given by Eq. 11, and regression dataset 2, the spherical‬
F
‭harmonic function, we evaluated the performance of the different optimization methods based on‬
‭ SE and R-squared, Figure 2a-2d. Along with this in order to simulate an environment in which‬
M
‭we may want to retrain our model several times we also performed rapid retraining. To do this‬
‭we trained our initial model on both regression dataset 1 and 2. After which we then iteratively‬
‭introduced new data to our respective datasets and retrained the models, paying attention to the‬
‭time taken for each retraining and total time spent training the model, Figure 2e and 2f.‬

‭ igure 2.‬‭(a) MSE and (b) R-Squared values of Bézier-KANs‬‭trained on regression dataset 1 using classical‬
F
‭optimizers and hybrid quantum annealing. (c) MSE and (d) R-Squared values of Bézier-KANs trained on regression‬
‭dataset 2 using an Adam optimizer, simulated annealing, and pure quantum annealing. (e) Time taken for initial‬
‭training and retraining using our different optimizers on regression dataset 1 and (f) on regression dataset 2. The‬
‭initial training had 1,000,000 data points while each retraining added 100,000 new data points.‬
I‭ t should be mentioned that the hybrid quantum annealer used here finds solutions in‬
‭around three seconds, as opposed to the tens of microseconds of a pure quantum annealer. It was‬
‭necessary to use a hybrid quantum annealer for some of our regression tasks as they use a greater‬
‭number of variables and hybrid quantum annealers can more consistently solve for a larger‬
n‭ umber of variables than current pure quantum annealers cand. Despite this large time increase‬
‭introduced to the quantum solutions by using a hybrid annealer, our hybrid method is able to‬
‭perform all four rounds of iterative retraining in an eighth of the time taken to perform a single‬
‭round of training by the gradient descent based optimizers for regression dataset 1! On top of this‬
‭even the simulated quantum annealer dramatically out performs the gradient descent based‬
‭optimizers on these tasks, due to not having to reprocess already seen samples. This is even taken‬
‭a step further when looking at dataset 2 as the pure quantum annealer is able to perform all four‬
‭rounds of retraining in less than 2% of the time taken by the gradient descent based optimizers to‬
‭perform the same task!‬
‭Regression dataset 3, given by Eq. 12, involved a more complex KAN with a larger‬
‭number of binary variables, over 200. Due to this large number of binary variables current‬
‭quantum annealers struggled to encode the task well and converge, while the simulated annealer‬
‭managed to find a solution if allowed to run enough times. With this we varied the degree of one‬
‭of the Bézier curves on the bottom layer from one to two, leaving the rest of the network‬
‭untouched with the other bottom Bézier curve and top Bézier curve having degrees of two and‬
‭one, respectively, and performed 50 hybrid optimizations for both of our modified networks,‬
‭Figure 3.‬

‭Figure 3.‬‭Distribution of (a) MSE and (b) R-Squared‬‭values for 100 hybrid optimizations of a Bézier-KAN, where‬
‭the degree of a Bézier curve in the first layer was changed from 1 to 2.‬
‭ rom this it can be seen that against what one would expect if we increase the flexibility‬
F
‭of the model to learn by increasing the degrees it has access to it actually can negatively affect‬
‭the model by increasing the chance of finding a less optimal solution. Although this is‬
‭unexpected it is attributed to the current state of quantum annealers in which the number of‬
‭variables is limited while both noise and chain breaks are very prevalent. For comparison,‬
‭simulated annealing was able to find a solution with a MSE value of 0.0041 and R-Squared of‬
‭0.963 for the Bézier curve being modified having a degree of one.‬

‭3.‬ ‭Discussion‬
‭ espite the limitations in the number and quality of current qubits, our work‬
D
‭demonstrates that it is already possible to create machine learning frameworks with training‬
‭speeds that outperform current classical methods in specific scenarios while maintaining‬
c‭ omparable performance. This advantage is particularly evident when retraining networks with‬
‭additional data. With pure quantum annealing, hybrid quantum annealing, and simulated‬
‭annealing all performing retraining faster than the gradient descent based optimizers, with pure‬
‭quantum annealing having the fastest retraining times. This success is achieved by creating‬
‭fixed-sized solution spaces with states that can be saved and rapidly solved using an annealer.‬
‭We anticipate further improvements in rapid retraining speed as advancements in quantum‬
‭hardware allow pure quantum annealers to accurately solve more complex problems.‬
‭In theory we should be seeing even larger speedups. This is because all that is needed for‬
‭our quantum optimization is a forward pass of the network before quantum annealing. Given that‬
‭theoretical annealing runs in a fraction of a second regardless of the problem size, the main time‬
‭component of the quantum optimization framework is the singular forward pass. Comparing this‬
‭to classical back propagation algorithms that require both a forward and backward pass for each‬
‭training epoch, it is easy to see that we should be seeing much greater experimental speedups. As‬
‭our experiments require 50 to 150 epochs of classical training, even without including the time‬
‭complexity of gradient calculations in backwards passes, one would expect to see speedups on‬
‭the scale of the number of training epochs.‬
‭We attribute this discrepancy between theoretical speedups and what we see‬
‭experimentally to code optimization. Given that data preprocessing constitutes a vast majority of‬
‭the runtime in most of our experiments, supplementary material S1, we anticipate that further‬
‭optimization of our code will lead to experimental speedups closer to that of theoretical.‬
‭There is also significant room for improvement of this algorithm with additional research‬
‭focused on optimizing the quantum annealing process, reducing the number of auxiliary‬
‭variables, problem decomposition, implementing similar approaches on gate-based quantum‬
‭computers‬‭29‬‭, and exploring the continuous optimization‬‭space, which is known to be simpler than‬
‭that of the discrete space‬‭30‬‭. Perhaps the most promising‬‭area of all for advancing this‬
‭methodology will be all-to-all connected quantum annealers‬‭31,32‬‭.‬‭All-to-all connectivity in‬
‭quantum annealers promises to drastically reduce, if not eliminate, the need for auxiliary‬
‭variables when solving HUBO problems‬‭33‬‭. This would‬‭theoretically eliminate the scalability‬
‭issue of our algorithm caused by the increasing need for auxiliary variables as the number of‬
‭layers in a network increases, allowing for a one-to-one qubit to variable representation of the‬
‭entire system.‬
‭With these things in mind, we foresee this as just the first of a new class of‬
‭quantum-optimizable machine learning models that create fixed-sized solution spaces. Exploring‬
‭alternative KAN formulations using other spline functions, Taylor series expansions of spline‬
‭basis functions, other objective functions, or entirely different machine learning models could‬
‭lead to the development of more models that fit within this class.‬
‭As potential applications of this model type span various fields, particularly where rapid‬
‭retraining can have a large effect, including finance, healthcare, environmental monitoring, and‬
‭cybersecurity there is a large incentive to expand on and optimize our proposed model type.‬
‭Along with this, although we already demonstrate experimental speedups in the training time of‬
o‭ ur model compared to classical optimization, continued advancements in quantum computing‬
‭will naturally enhance the performance of our model. Allowing for a potential future in which‬
‭complex machine learning networks could theoretically be trained and retrained on quantum‬
‭computers at speeds dramatically faster than their classical counterparts are capable of.‬

‭4.‬ ‭Methods‬
‭ ANs are a neural network architecture designed to represent complex functions with a‬
K
‭minimal number of parameters. In order to achieve this representation, KANs build networks‬
‭that are composed of basis functions. The original KAN representation used B-spline basis‬
‭functions summed with weighted activation functions to manipulate edges of a network.‬
‭However, it has also been shown that many other types of basis functions including‬
‭Fourier-splines‬‭34‬ ‭and wavelets‬‭15‬ ‭can also work to‬‭create KANs. While we are able to use our end‬
‭formulation to optimize one layer KANs of many different basis functions including B-splines,‬
‭fourier-splines, and gaussian radial basis function splines, supplementary material S2, these‬
‭spline types are not able to fit into the QUBO framework when stacked. This is because they are‬
‭composed of either non-differentiable base functions when fully expanded or functions whose‬
‭variables are not able to be expressed in linear and quadratic terms alone. While this is not a‬
‭problem for traditional optimization methods, as we will see, it becomes a significant issue when‬
‭optimizing a multi-layered KAN using QUBO methods.‬

‭4.1. Quadratic Unconstrained Binary Optimization (QUBO) Problem‬


‭QUBO is a mathematical formulation used to represent optimization problems where the‬
‭objective is to find the minimum of energy a quadratic polynomial over binary variables. These‬
‭problem setups represent energy functions that are extremely well-suited for quantum annealers,‬
‭such as those developed by D-Wave, because they can exploit quantum parallelism, quantum‬
‭tunneling, and quantum entanglement to search for the global minimum more efficiently than‬
‭classical methods. In general for quantum annealers this energy function can be written as:‬
→ →‭𝑇‬ →
‭𝑓(‬ ‭𝑞‬‭)‬ = ‭𝑞‬ ‭𝑄‬‭𝑞‬ ‭(1)‬

( )‭𝑇‬
‭where‬‭𝑄‬‭is an upper matrix,‬‭𝑞‬ = ‭𝑞‭1‬ ‬,..., ‭𝑞‬‭𝑁‬ ‭, and‬‭𝑞‭𝑖‬‬ ‭represents binary variables. Given that‬
‭2‬
‭𝑞‭𝑖‬‬ ∈ [‭0,‬ ‭1]‬ ‭,‬‭𝑞‭𝑖‬‬ = ‭𝑞‬‭𝑖‭,‬ which leads us to‬‭the QUBO formulation of this energy function:‬

→ ‭𝑁‬ ‭𝑁‬
‭𝑓(‬ ‭𝑞‬‭)‬ = ∑ ‭𝑄‭𝑖‬‬,‭𝑖‬‭𝑞‭𝑖‬‬ + ∑ ‭𝑄‭𝑖‬‬,‭𝑗‬‭𝑞‬‭𝑖‬‭𝑞‭𝑗‬ ‬ ‭(2)‬
‭𝑖‬=‭1‬ ‭𝑖‬<‭𝑗‬

‭where‬‭𝑄‬‭𝑖‬,‭𝑖‬ ‭and‬‭𝑄‬‭𝑖‬,‭𝑗‬ ‭denote the diagonal‬‭and off-diagonal terms, respectively, of a QUBO matrix‬‭35‬‭.‬
‭ hese binary variables in our case represent the control points in our basis functions that‬
T
‭make up the KANs. These continuous control points can be made into binary variables using‬
‭multiple forms of binary discretization, commonly in the form radix 2:‬
‭𝑚‬
‭𝑙‬
‭𝑥‬‭𝑖‬ ≈ ∑ ‭2‬‭𝑞‬‭𝑖‬,‭𝑙‬ ‭(3)‬
‭𝑙=
‬ −‭𝑚‬

‬ ‬ + ‭1‬‭is the number of qubits used to represent‬‭𝑥‬‭36‬‭.‬


‭where‬‭𝑥‬‭𝑖‬ ‭is any positive real integer and‬‭2‭𝑚
+
‭From this one can see that this can be expanded to include negative numbers by creating a‬‭𝑞‬

‭and‬‭𝑞‬ ‭to represent positive and negative values‬‭respectively:‬
‭𝑚‬
‭𝑙‬ + ‭𝑙‬ −
‭𝑥‬‭𝑖‬ ≈ ∑ ‭2‬‭𝑞‬‭𝑖‬,‭𝑙‬ − ‭2‬‭𝑞‬‭𝑖‬,‭𝑙‬ ‭(4)‬
‭𝑙=
‬ −‭𝑚‬

‭ hile QUBO problems are ideal for quantum annealing, the formatting of a KAN‬
W
‭optimization problem into one presents several unique challenges. The first difficulty is that‬
‭variables of interest are required to be able to be expressed in only linear or quadratic terms. The‬
‭second difficulty lies in the need to fully expand the basis functions of a KAN into their‬
‭component terms. This expansion is crucial as the entire KAN can be thought of as one large‬
‭equation with each layer feeding into the next. This leads to a problem when functions used to‬
‭make a layer are discontinuous and reliant on one knowing the output of the previous layer.‬
‭Unlike traditional neural network optimization, where you know the current output of a layer, as‬
‭weights are iteratively adjusted to minimize the loss function via gradient descent, QUBO does‬
‭not initialize the weights before solving. As QUBO solves for all weights at the same time, on‬
‭what can be thought of as a black box, this approach eliminates the iterative modification of‬
‭variables which is so time consuming in backpropagation. Although, this QUBO formulation‬
‭brings in the extra requirements of variables being expressed in linear or quadratic terms and of‬
‭continuous differentiability across all functions involved.‬

‭4.2. Bézier Curves as a Solution‬


‭To address these issues presented by QUBO formulation, we utilize Bézier curves as the‬
‭basis functions for our KANs. Bézier curves are continuously differentiable in their expanded‬
‭form and allow variables to be expressed in terms of just coefficients and control points, making‬
‭them suitable for multi-layer optimization using QUBO methods. With a Bézier curve of degree‬
‭𝑛 is given by:‬
‭𝑛‬
‭𝐵‬(‭𝑡‬) = ∑
‭𝑖‬=‭0‬
( ) (‭1‬ − ‭𝑡)‬
‭‬
𝑛
‭𝑖‬
‭𝑛‬−‭𝑖‬ ‭𝑖‬
‭𝑡‬‭𝑃‭𝑖‬‬‭‬ ‭(5)‬

‭where‬‭𝑃‭𝑖‬‬ ‭are the control points and‬‭𝑡‬ ∈ [‭0,‬ ‭1]‬ ‭. To formulate this into a QUBO format the‬
c‭ ontinuous‬‭𝑃‬‭variables are then discretized using‬‭variations of radix 2 encoding. This makes the‬
‭theoretical number of logical qubits needed to solve the problem directly related to the number of‬
‭control points in a given problem and the number of binary variables used to represent each‬
‭control point.‬
‭4.3. Auxiliary Variables and HUBO Formulation‬
‭ rom this, one can see how this may lead to variable interactions beyond the quadratic‬
F
‭order, the maximum order of which a QUBO formulation can solve for, creating a Higher Order‬
‭Unconstrained Binary Optimization (HUBO) problem, supplementary material S3. While this‬
‭may seem like a major problem, one can easily get around it by introducing auxiliary variables‬
‭with a singular variable being used to represent 2 or more others. An auxiliary variable that‬
‭represents 2 other variables can be given by:‬
‭𝑉‬‭𝐴𝑈𝑋‬ = ‭𝑃‭1‬ ‭𝑃 ‬
‬ ‭2‬ ‭(6)‬

‭With this relation being enforced by adding penalty terms the the QUBO problem in the form of:‬

‭𝑃𝑒𝑛𝑎𝑙𝑡𝑦‬ = ‭𝑤‬ ‭𝑃‭1‬ ‭𝑃


‬ ‭2‬ (
‬ − ‭2‭𝑉
‬ ‬
‭𝐴𝑈𝑋‬
‭𝑃‬‭1‬ − ‭2‬‭𝑉‭𝐴‬ 𝑈𝑋‬‭𝑃‭2‬ ‬ + ‭3‭𝑉
‬ ‬‭𝐴𝑈𝑋‬ ) ‭(7)‬

‭ here‬‭𝑤‬‭is the penalty coefficient‬‭37‬‭, 15-25 times‬‭the max coefficient value in our QUBO matrix‬
w
‭in this study.‬
‭This makes the total number of logical qubits needed to optimize a network equal to the‬
‭number of control points multiplied by the number of binary variables used to represent each‬
‭control point plus the number of auxiliary variables:‬
‭𝑛𝑢𝑚‬‭𝑐𝑜𝑛𝑡𝑟𝑜𝑙‬‭_‭𝑣‬ 𝑎𝑟𝑠‬

‭𝑛𝑢𝑚‬‭𝑞𝑢𝑏𝑖𝑡𝑠‬ = ‭𝑛𝑢𝑚‬‭𝑎𝑢𝑥‬‭_‭𝑣‬ 𝑎𝑟𝑠‬ + ∑ ‭𝑛𝑢𝑚‬‭𝑏𝑖𝑛𝑎𝑟𝑦‬‭_‭𝑣‬ 𝑎𝑟𝑠‬,‭𝑖‬ ‭(8)‬


‭𝑖=
‬ ‭1‬

‭4.4. Formulation of The QUBO Objective Function‬


‭In order to create a QUBO optimization function one must define an objective function (‬
‭𝑂𝐵𝐽‬‭)‬‭to be optimized. In our case we used the mean‬‭squared error (MSE) as our machine‬
‭learning objective. MSE was chosen as unlike many other traditional machine learning objective‬
‭functions, it is fully continuous. Making our objective take the form of:‬
‭𝑛‬ ‭2‬
‭𝑂𝐵𝐽‬ =
‭‬
1
‭𝑛‬ ( )
∑ ‭𝑦‬‭𝑖‬ − ‭𝑦‬‭𝑖‬
‭𝑖‬=‭1‬
‭(9)‬

‭where‬‭𝑛‬‭in the number of training samples,‬‭𝑦‬‭𝑖‬ ‭is the desired output at sample‬‭𝑖‬‭, and‬‭𝑦‬‭𝑖‬ ‭is the‬
n‭ etwork output. This objective function is a perfectly valid one and is commonly used in‬
‭machine learning, however, if optimized to its fullest extent it can easily lead to overfitting of the‬
‭training data. Since we cannot check for overfitting between training iterations, as we can‬
‭classically, we re-formulate the objective function so that it pays attention to a validation dataset‬
‭on its own:‬
‭𝑛‬‭𝑡‬ ‭𝑛‭𝑣‬ ‬
‭2‬ ‭2‬
‭𝑂𝐵𝐽‬ =
‭‬
1
‭𝑛‬‭𝑡‬ (
∑ ‭𝑦‬‭𝑡‬ − ‭𝑦‬‭𝑡‬ +
‭𝑡‬=‭1‬
) λ
‭𝑛‭𝑣‬ ‬ (
∑ ‭𝑦‬‭𝑣‬ − ‭𝑦‬‭𝑣‬
‭𝑣‬=‭1‬
) ‭(10)‬
‭with‬‭𝑛‭𝑡‬ ‬ ‭and‬‭𝑛‭𝑣‬ ‬ ‭denoting the number of training and validation samples, respectively, and‬λ
‭representing the validation dataset weighting.‬

‭4.5. Rapid Retraining Capability‬


‭Given an initial dataset the expanded equation that represents the KAN is created and all‬
‭available data points are put through a forward pass, after which objective function 1 (‬‭𝑂𝐵𝐽‬‭1‭)‬ , is‬
s‭ aved and a quantum annealer is used to solve for the weights given the input dataset(s). When‬
‭new data becomes available, the expanded equation that represents the KAN and‬‭𝑂𝐵𝐽‬‭1‬ ‭is‬
r‭ eloaded. The new data is put through a forward pass and the results of this forward pass are‬
‭added to‬‭𝑂𝐵𝐽‬‭1‭,‬ creating‬‭𝑂𝐵𝐽‬‭2‬‭. Once this‬‭is done the new‬‭𝑂𝐵𝐽‬‭,‬‭𝑂𝐵𝐽‬‭2‬‭, is sent to a quantum‬
a‭ nnealer for optimization. With each trial of the optimizer taking tens of microseconds this‬
‭creates an extremely fast retraining strategy without needing to reprocess already seen samples.‬
‭Additionally this allows for the removal of samples if desired by simply putting then through a‬
‭forward pass and then subtracting them from the‬‭𝑂𝐵𝐽‬‭.‬

‭4.6. Experimental Setup‬


‭We conducted experiments using two classification datasets and three regression datasets,‬
‭each with varying sizes and complexities which could be solved given the limited number of‬
‭available qubits. For classification tasks these datasets consist of the circle and moon datasets‬
‭given by the popular SciPy package while for regression all tasks were selected from the original‬
‭KAN paper. Regression dataset 1 is given by Eq. 11, dataset 2 is given by the spherical harmonic‬
‭function with m and n set to 0 and 1, respectively, and dataset 3 is given by Eq. 12.‬
‭3‭𝑥
‬‬
‭𝑧‬ = ‭𝑒𝑥𝑝‬(‭𝑦)‬ +‭𝑒𝑥𝑝‬(−‭𝑦‬) ‭(11)‬

‭2‬ ‭2‬
‭𝑧‬ = ‭2‬ ‭1‬ + ‭𝑥‬ + ‭𝑦‬ ‭(12)‬

‭ he dataset inputs were normalized from 0 to 1 so that inputs are within the preferable‬
T
‭range of a Bézier curve. Gradient descent based optimization was done using pytorch with a‬
‭modified version of the original KAN library‬‭28‬‭. For‬‭all test cases involving the gradient descent‬
‭based optimizers, only results with the optimal learning rate and number of steps were reported‬
‭in the main text. The optimal gradient descent based learning rates were found by iteratively‬
‭testing different learning rates for each problem from 0.001 to 1.5, with the ‘best’ denoting the‬
‭learning rate that converged on the most optimal solution first. From this, the optimal number of‬
‭steps was found by stopping learning once the loss function of the best learning rate converged‬
‭for multiple training iterations.‬
‭For annealing, the objective problem formulation was done in c++ called from python‬
‭which then used the pyqubo library‬‭38,39‬ ‭to convert‬‭the‬‭𝑂𝐵𝐽‬‭to a QUBO matrix. Simulated‬
‭annealing was done using the D-Wave simulated annealing library while quantum annealing used‬
‭the D-Wave Advantage2 prototype 2.3 as a pure quantum annealer and hybrid binary quadratic‬
‭ odel version 2 as a hybrid quantum annealer. Although quantum annealing can be run in tens of‬
m
‭microseconds, the large amount of quantum noise currently present in quantum systems requires‬
‭hundreds to thousands of quantum annealing runs to be done per problem. This is reflected in our‬
‭experimental timings and makes the time taken by quantum annealing go from tens of‬
‭microseconds to around a tenth of a microsecond for our test cases. Finally, since all‬
‭optimization methods tested are highly amenable to parallel computing, this factor was‬
‭eliminated and all classical computations were done on a single thread of an Intel Xeon Scalable‬
‭Platinum 8173M Processor processor running at a 2 GHz clock speed.‬

‭ oftware Availability Statement:‬


S
‭Code used in this study is available at https://github.com/wtroy2/Quantum-KAN‬

‭ onflict of interest:‬
C
‭The authors declare that they have no conflict of interest.‬

‭References:‬
‭1.‬ ‭Georgopoulos, K. Quantum Computing:Understanding its Potential Applications.‬‭PECB Insights‬

‭https://insights.pecb.com/quantum-computing-understanding-potential-applications/ (2023).‬

‭2.‬ ‭Grover, L. K. & Sengupta, A. M. Classical analog of quantum search.‬‭Phys. Rev. A‬‭65‬‭, 032319 (2002).‬

‭3.‬ ‭A new quantum algorithm for classical mechanics with an exponential speedup.‬

‭http://research.google/blog/a-new-quantum-algorithm-for-classical-mechanics-with-an-exponential-spe‬

‭edup/.‬

‭4.‬ ‭Shor, P. W. Polynomial-Time Algorithms for Prime Factorization and Discrete Logarithms on a‬

‭Quantum Computer.‬‭SIAM J. Comput.‬‭26‬‭, 1484–1509 (1997).‬

‭5.‬ ‭Castelvecchi, D. The AI–quantum computing mash-up: will it revolutionize science?‬‭Nature‬‭(2024)‬

‭doi:10.1038/d41586-023-04007-0.‬

‭6.‬ ‭Arute, F.‬‭et al.‬‭Quantum supremacy using a programmable‬‭superconducting processor.‬‭Nature‬‭574‬‭,‬

‭505–510 (2019).‬

‭7.‬ ‭Madsen, L. S.‬‭et al.‬‭Quantum computational advantage‬‭with a programmable photonic processor.‬

‭Nature‬‭606‬‭, 75–81 (2022).‬

‭8.‬ ‭Date, P., Arthur, D. & Pusey-Nazzaro, L. QUBO formulations for training machine learning models.‬

‭Sci. Rep.‬‭11‬‭, 10029 (2021).‬


‭9.‬ ‭Huang, H.-Y.‬‭et al.‬‭Quantum advantage in learning from experiments.‬‭Science‬‭376‬‭, 1182–1186 (2022).‬

‭10.‬ ‭Havlíček, V.‬‭et al.‬‭Supervised learning with quantum-enhanced‬‭feature spaces.‬‭Nature‬‭567‬‭,‬

‭209–212 (2019).‬

‭11.‬‭Broughton, M.‬‭et al.‬‭TensorFlow Quantum: A Software‬‭Framework for Quantum Machine Learning.‬

‭Preprint at https://doi.org/10.48550/arXiv.2003.02989 (2021).‬

‭12.‬ ‭Javadi-Abhari, A.‬‭et al.‬‭Quantum computing with‬‭Qiskit. Preprint at‬

‭https://doi.org/10.48550/arXiv.2405.08810 (2024).‬

‭13.‬ ‭Chou, C. & Lucero, E. Quantum computing: Facts, fiction and the future. (2024).‬

‭14.‬ ‭Liu, Z.‬‭et al.‬‭KAN: Kolmogorov-Arnold Networks.‬‭Preprint at‬

‭https://doi.org/10.48550/arXiv.2404.19756 (2024).‬

‭15.‬ ‭Bozorgasl, Z. & Chen, H. Wav-KAN: Wavelet Kolmogorov-Arnold Networks.‬‭arXiv.org‬

‭https://arxiv.org/abs/2405.12832v2 (2024).‬

‭16.‬ ‭Kingma, D. P. & Ba, J. Adam: A Method for Stochastic Optimization. Preprint at‬

‭https://doi.org/10.48550/arXiv.1412.6980 (2017).‬

‭17.‬ ‭Yuan, W. & Gao, K.-X. EAdam Optimizer: How $\epsilon$ Impact Adam. Preprint at‬

‭https://doi.org/10.48550/arXiv.2011.02150 (2020).‬

‭18.‬ ‭Brantner, B. Generalizing Adam to Manifolds for Efficiently Training Transformers. Preprint at‬

‭https://doi.org/10.48550/arXiv.2305.16901 (2023).‬

‭19.‬ ‭Ahn, K., Zhang, Z., Kook, Y. & Dai, Y. Understanding Adam Optimizer via Online Learning of‬

‭Updates: Adam is FTRL in Disguise. Preprint at https://doi.org/10.48550/arXiv.2402.01567 (2024).‬

‭20.‬ ‭Gudla, S. P. K. & Bhoi, S. K. A Study on Effect of Learning Rates Using Adam Optimizer in‬

‭LSTM Deep Intelligent Model for Detection of DDoS Attack to Support Fog Based IoT Systems. in‬

‭Computing, Communication and Learning‬‭(eds. Panda,‬‭S. K. et al.) 27–38 (Springer Nature‬

‭Switzerland, Cham, 2022). doi:10.1007/978-3-031-21750-0_3.‬

‭21.‬ ‭Bottou, L. Large-Scale Machine Learning with Stochastic Gradient Descent. in‬‭Proceedings of‬

‭COMPSTAT’2010‬‭(eds. Lechevallier, Y. & Saporta, G.)‬‭177–186 (Physica-Verlag HD, Heidelberg,‬


‭2010). doi:10.1007/978-3-7908-2604-3_16.‬

‭22.‬ ‭Duchi, J., Hazan, E. & Singer, Y. Adaptive Subgradient Methods for Online Learning and‬

‭Stochastic Optimization.‬‭J Mach Learn Res‬‭12‬‭, 2121–2159‬‭(2011).‬

‭23.‬ ‭Kirkpatrick, S., Gelatt, C. D. & Vecchi, M. P. Optimization by Simulated Annealing.‬‭Science‬‭220‬‭,‬

‭671–680 (1983).‬

‭24.‬ ‭Bertsimas, D. & Tsitsiklis, J. Simulated Annealing.‬‭Stat. Sci.‬‭8‬‭, 10–15 (1993).‬

‭25.‬ ‭Pelofske, E. Comparing Three Generations of D-Wave Quantum Annealers for Minor Embedded‬

‭Combinatorial Optimization Problems. Preprint at https://doi.org/10.48550/arXiv.2301.03009 (2023).‬

‭26.‬ ‭Paszke, A.‬‭et al.‬‭PyTorch: An Imperative Style,‬‭High-Performance Deep Learning Library. in‬

‭Advances in Neural Information Processing Systems‬‭vol. 32 (Curran Associates, Inc., 2019).‬

‭27.‬ ‭Virtanen, P.‬‭et al.‬‭SciPy 1.0: fundamental algorithms‬‭for scientific computing in Python.‬‭Nat.‬

‭Methods‬‭17‬‭, 261–272 (2020).‬

‭28.‬ ‭KindXiaoming/pykan: Kolmogorov Arnold Networks. https://github.com/KindXiaoming/pykan.‬

‭29.‬ ‭Glos, A., Krawiec, A. & Zimborás, Z. Space-efficient binary optimization for variational quantum‬

‭computing.‬‭Npj Quantum Inf.‬‭8‭,‬ 1–8 (2022).‬

‭30.‬ ‭Liu, Z.‬‭et al.‬‭How Do Adam and Training Strategies‬‭Help BNNs Optimization. in‬‭Proceedings of‬

‭the 38th International Conference on Machine Learning‬‭6936–6946 (PMLR, 2021).‬

‭31.‬ ‭Lechner, W., Hauke, P. & Zoller, P. A quantum annealing architecture with all-to-all connectivity‬

‭from local interactions.‬‭Sci. Adv.‬‭1‭,‬ e1500838 (2015).‬

‭32.‬ ‭Pita-Vidal, M., Wesdorp, J. J. & Andersen, C. K. Blueprint for all-to-all connected‬

‭superconducting spin qubits. Preprint at https://doi.org/10.48550/arXiv.2405.09988 (2024).‬

‭33.‬ ‭Yarkoni, S., Raponi, E., Bäck, T. & Schmitt, S. Quantum annealing for industry applications:‬

‭introduction and review.‬‭Rep. Prog. Phys.‬‭85‬‭, 104001‬‭(2022).‬

‭34.‬ ‭Noesis, G. GistNoesis/FourierKAN. (2024).‬

‭35.‬ ‭Jun, K. QUBO formulations for a system of linear equations.‬‭Results Control Optim.‬‭14‬‭, 100380‬

‭(2024).‬
‭36.‬ ‭O’Malley, D. & Vesselinov, V. V. ToQ.jl: A high-level programming language for D-Wave‬

‭machines based on Julia. in‬‭2016 IEEE High Performance‬‭Extreme Computing Conference (HPEC)‬

‭1–7 (2016). doi:10.1109/HPEC.2016.7761616.‬

‭37.‬ ‭Jun, K. & Lee, H. HUBO formulations for solving the eigenvalue problem.‬‭Results Control‬

‭Optim.‬‭11‬‭, 100222 (2023).‬

‭38.‬ ‭Zaman, M., Tanahashi, K. & Tanaka, S. PyQUBO: Python Library for Mapping Combinatorial‬

‭Optimization Problems to QUBO Form.‬‭IEEE Trans. Comput.‬‭71‬‭, 838–850 (2022).‬

‭39.‬ ‭Tanahashi, K., Takayanagi, S., Motohashi, T. & Tanaka, S. Application of Ising Machines and a‬

‭Software Development for Ising Machines.‬‭J. Phys.‬‭Soc. Jpn.‬‭88‬‭, 061010 (2019).‬


‭Sparks of Quantum Advantage and Rapid Retraining in Machine‬
‭Learning Supplementary‬

‭S1:‬

‭ igure S1.‬‭The training times of the quantum algorithm‬‭split up by step within the algorithm. These steps being‬
F
‭processing, quantum annealing, and post processing. Preprocessing includes formulation of the objective function,‬
‭the forward pass, and variable separation. The datasets depicted include (a) the timing of algorithm steps in the‬
‭circle and moon datasets, used in Figure 1, and (b) regression dataset 1 from Figure 2 of the main text. As can be‬
‭seen as the problem gets more intricate the preprocessing time becomes a majority of the quantum algorithm. We‬
‭attribute this to our implementation of the code for formulating the objective function not being fully optimized.‬

‭S2:‬
‭ any types of functions work for a single-layer KAN network that is to be put in QUBO‬
M
‭format that do not work for a multi-layered KAN in QUBO format. This is due to the fact that for‬
‭a one layer network at the time of doing the forward pass we know the value of all inputs to each‬
‭basis function. Implementations of the B-spline, Fourier-spline, and Gaussian-Spline that are‬
‭quantum optimizable on a simulated annealer are shown in Figure S2.‬
‭ igure S2.‬‭Different spline functions trained to fit‬‭to‬‭𝑦‬ = ‭𝑠𝑖𝑛‬(‭𝑥‬) ‭by creating an MSE objective function‬‭converted‬
F
‭to QUBO format and solved using simulated annealing.‬

‭ his becomes a problem when we start to stack layers as the output of one layer becomes‬
T
‭the input to another. When this happens points we are solving for are either expressed as‬
‭conditionals or in terms of functions which do not allow them to be expressed as either linear or‬
‭quadratic, both of which are unacceptable for QUBO.‬

‭S3‬‭:‬
‭ iven that we are optimizing a 2-layer Bézier-KAN of shape [1,1,1] and we are using the‬
G
‭syntax‬‭𝐵‬‭𝑖‬,‭ℓ‬ ‭for our Bézier curves where‬‭𝑖‬‭is‬‭the index in the layer and‬‭ℓ‬‭is layer; we can define‬
‭the entirety of the KAN as:‬
( )
‭𝑧‬ = ‭𝐵‬‭1,‬‭2‬ ‭𝐵‭1‬ ‬,‭1‬ ‭‬ ‭(S1)‬

I‭ f we then say all of the Bézier curves in this KAN are of order 1 we can define each Bézier‬
‭curve as:‬
‭𝐵‬‭𝑖‬,‭ℓ‬(‭𝑡‬) = (‭1‬ − ‭𝑡)‭
‬ ‭𝑃
‬ ‭0‬ ‬,‭𝑖‬,‭ℓ‬ + ‭𝑡‬‭𝑃‭1‬ ‬,‭𝑖,‬‭ℓ‬‭‬, ‭‬‭0‬ ≤ ‭𝑡‬ ≤ ‭1‭‬‬ ‭(S2)‬

I‭ f we then discretize the continuous control points using only the integer representation of radix‬
‭2 with two qubits,‬‭𝑞‬‭, per control point we then‬‭end up with Eq. S3 for a given Bézier curve:‬
‭𝐵‬‭𝑖‬,‭ℓ‬(‭𝑡‬) = (‭1‬ − ‭𝑡)‭ ( ) ( )
‬ ‬ ‭𝑞‬‭0,‬‭0,‬‭𝑖‬,‭ℓ‬ + ‭2‬‭𝑞‭1‬ ‬,‭0,‬‭𝑖‬,‭ℓ‬ + ‭𝑡‭‬‬ ‭𝑞‬‭0,‬‭1‬,‭𝑖,‬‭ℓ‬ + ‭2‬‭𝑞‭1‬ ,‬‭1‬,‭𝑖‬,‭ℓ‬ ‭‬, ‭‬‭0‬ ≤ ‭𝑡‬ ≤ ‭1‬‭‬ ‭(S3)‬

‭Substituting Eq S3 into Eq. S1 we get:‬


(
‭𝑧‬ = ‭1‬ − (‭1‬ − ‭𝑡)‭ ( ) (
‬ ‬ ‭𝑞‬‭0,‬‭0,‬‭0,‬‭0‬ + ‭2‬‭𝑞‭1‬ ‬,‭0,‬‭0‬,‭0‬ + ‭𝑡‭‬‬ ‭𝑞‬‭0,‬‭1‬,‭0,‬‭0‬ + ‭2‬‭𝑞‬‭1,‬‭1‬,‭0,‬‭0‬ ‭‬ ‭𝑞‭0‬ ,‬‭0‬,‭0,‬‭1‬ + ‭2‭𝑞 )) (
‬ ‬‭1,‬‭0,‬‭0,‬‭1‬ )
‭(S4)‬
+ (‭1‬ − ‭𝑡)‭ (
‬ ‬ ‭𝑞‭0‬ ‬,‭0‬,‭0‬,‭0‬ + ‭2‭𝑞 ) (
‬ ‬‭1,‬‭0,‬‭0‬,‭0‬ + ‭𝑡‬‭‬ ‭𝑞‭0‬ ‬,‭1,‬‭0‬,‭0‬ + ‭2‭𝑞
‬ ‬‭1,‬‭1‬,‭0,‬‭0‬ ‭‬ ‭𝑞‭0‬ ,‬‭1‬,‭0,‬‭1‬ + ‭2‭𝑞 )(
‬ ‬‭1‬,‭1,‬‭0‬,‭1‬ ‭‬ )
‭Which when multiplied out goes to:‬
‭𝑧‬ = ‭𝑞‬‭0,‬‭0,‬‭0‬,‭1‬ + ‭2‭𝑞
‬ ‭1‬ ‬,‭0‬,‭0,‬‭1‬
− ‭𝑞‬‭0,‬‭0,‬‭0,‬‭1‬‭𝑞‬‭0‬,‭0‬,‭0‬,‭0‬ − ‭2‭𝑞
‬ ‬‭0,‬‭0,‬‭0,‬‭1‭𝑞 ‬
‬ ‭1‬,‭0‬,‭0,‬‭0‬
− ‭2‭𝑞
‬ ‬‭1‬,‭0‬,‭0,‬‭1‬‭𝑞‭0‬ ,‬‭0‬,‭0,‬‭0‬ − ‭4‭𝑞
‬ ‬‭1‬,‭0,‬‭0‬,‭1‭𝑞 ‬
‬ ‭1‬,‭0,‬‭0‬,‭0‬
+ ‭𝑡𝑞‬‭0,‬‭0,‬‭0,‬‭1‭𝑞 ‬
‬ ‭0,‬‭0,‬‭0,‬‭0‬
+ ‭2‭𝑡‬ ‬‭𝑞‬‭0,‬‭0,‬‭0,‬‭1‬‭𝑞‬‭1,‬‭0‬,‭0‬,‭0‬ + ‭2‭𝑡‬ ‬‭𝑞‭1‬ ‬,‭0,‬‭0‬,‭1‭𝑞 ‬
‬ ‭0‬,‭0,‬‭0‬,‭0‬
+ ‭4‬‭𝑡‬‭𝑞‬‭1,‬‭0‬,‭0,‬‭1‬‭𝑞‬‭1,‬‭0‬,‭0,‬‭0‬
− ‭𝑡𝑞‬‭0,‬‭0,‬‭0,‬‭1‭𝑞 ‬
‬ ‭0,‬‭1,‬‭0,‬‭0‬
− ‭2‭𝑡‬ ‬‭𝑞‬‭0,‬‭0,‬‭0,‬‭1‬‭𝑞‬‭1,‬‭1‬,‭0‬,‭0‬ − ‭2‭𝑡‬ ‬‭𝑞‭1‬ ‬,‭0,‬‭0‬,‭1‭𝑞 ‬
‬ ‭0‬,‭1,‬‭0‬,‭0‬
− ‭4‬‭𝑡‬‭𝑞‬‭1,‬‭0‬,‭0,‬‭1‬‭𝑞‬‭1,‬‭1‬,‭0,‬‭0‬ ‭(S5)‬
+ ‭𝑞‭0‬ ‬,‭1,‬‭0,‬‭1‭𝑞 ‬
‬ ‭0,‬‭0,‬‭0,‬‭0‬
+ ‭2‬‭𝑞‬‭0,‬‭1,‬‭0,‬‭1‬‭𝑞‬‭1,‬‭0‬,‭0‬,‭0‬ + ‭2‭𝑞
‬ ‬‭1,‬‭1,‬‭0‬,‭1‭𝑞 ‬
‬ ‭0‬,‭0,‬‭0‬,‭0‬
+ ‭4‬‭𝑞‬‭1,‬‭1‬,‭0,‬‭1‬‭𝑞‬‭1,‬‭0‬,‭0,‬‭0‬
− ‭𝑡𝑞‬‭0,‬‭1,‬‭0,‬‭1‭𝑞 ‬
‬ ‭0,‬‭0,‬‭0,‬‭0‬
− ‭2‭𝑡‬ ‬‭𝑞‬‭0,‬‭1,‬‭0,‬‭1‬‭𝑞‬‭1,‬‭0‬,‭0‬,‭0‬ − ‭2‭𝑡‬ ‬‭𝑞‭1‬ ‬,‭1,‬‭0‬,‭1‭𝑞 ‬
‬ ‭0‬,‭0,‬‭0‬,‭0‬
− ‭4‬‭𝑡‬‭𝑞‬‭1,‬‭1‬,‭0,‬‭1‬‭𝑞‬‭1,‬‭0‬,‭0,‬‭0‬
‭+‭𝑡‬ 𝑞‬‭0‬,‭1‬,‭0‬,‭1‬‭𝑞‭0‬ ‬,‭1‬,‭0,‬‭0‬ + ‭2‭𝑡‬ ‬‭𝑞‬‭0,‬‭1,‬‭0,‬‭1‭𝑞 ‬
‬ ‭1‬,‭1,‬‭0‬,‭0‬
+ ‭2‬‭𝑡‬‭𝑞‭1‬ ‬,‭1,‬‭0‬,‭1‭𝑞 ‬
‬ ‭0‬,‭1,‬‭0‬,‭0‬
+ ‭4‭𝑡‬ ‬‭𝑞‬‭1,‬‭1‬,‭0,‬‭1‬‭𝑞‭1‬ ,‬‭1‬,‭0,‬‭0‬

‭ ow considering our objective function is MSE as defined in Eq. 8 we see that we must‬
N
‭square Eq. S5. As this will quickly blow up into a much larger equation we will simplify it for‬
‭this example and just take the last two terms in Eq. S5 and square them to see that the problem‬
‭quickly becomes a higher order than quadratic with more than two‬‭𝑞‬‭variables being multiplied‬
‭together. Keeping in mind a binary variable squared is equal to the original binary variable:‬

(‭2‭𝑡‬ ‬‭𝑞‬‭1,‬‭1‬,‭0‬,‭1‬‭𝑞‭0‬ ‬,‭1‬,‭0,‬‭0‬ + ‭4‭𝑡‬ ‬‭𝑞‬‭1,‬‭1,‬‭0,‬‭1‭𝑞‬ ‬‭1‬,‭1,‬‭0‬,‭0‬)‭2‬ = ‭‬


‭2‬ ‭2‬ ‭2‬ ‭(S6)‬
‭‬‭‬‭‬‭‬‭‬‭‭‬‬‭4‭𝑡‬ ‬ ‭𝑞‭1‬ ‬,‭1‬,‭0‬,‭1‬‭𝑞‭0‬ ‬,‭1,‬‭0,‬‭0‬ + ‭16‬‭𝑡‬ ‭𝑞‬‭1‬,‭1,‬‭0‬,‭1‬‭𝑞‬‭0,‬‭1‬,‭0‬,‭0‬‭𝑞‬‭1,‬‭1‬,‭0,‬‭0‬ + ‭8‬‭𝑡‬ ‭𝑞‬‭1,‬‭1‬,‭0,‬‭1‬‭𝑞‭1‬ ,‬‭1‬,‭0,‬‭0‬

‭ ven in this simplistic example where we only have three unique binary variables in Eq.‬
E
‭S6, since everything is squared by the MSE objective function, variables become intertwined. In‬
‭which auxiliary variables become necessary to bring the problem back to quadratic. This‬
‭problem only becomes more pronounced when using higher-order Bézier curves within the‬
‭KAN. As everything being input into each Bézier curve is raised to the degree of that curve,‬
‭providing more opportunities to create a higher-order problem.‬

You might also like