Coherent Feed Forward Quantum Neural Network
Coherent Feed Forward Quantum Neural Network
Coherent Feed Forward Quantum Neural Network
of study. Current QNN models primarily employ variational circuits on an ansatz or a quantum feature map,
often requiring multiple entanglement layers. This methodology not only increases the computational cost of
the circuit beyond what is practical on near-term quantum devices but also misleadingly labels these models as
neural networks, given their divergence from the structure of a typical feed-forward neural network (FFNN).
Moreover, the circuit depth and qubit needs of these models scale poorly with the number of data features,
resulting in an efficiency challenge for real-world machine-learning tasks. We introduce a bona fide QNN
model, which seamlessly aligns with the versatility of a traditional FFNN in terms of its adaptable intermediate
layers and nodes, absent from intermediate measurements such that our entire model is coherent. This model
stands out with its reduced circuit depth and number of requisite C-NOT gates to outperform prevailing QNN
models. Furthermore, the qubit count in our model remains unaffected by the data’s feature quantity. We test
our proposed model on various benchmarking datasets such as the diagnostic breast cancer (Wisconsin) and
credit card fraud detection datasets. We compare the outcomes of our model with the existing QNN methods to
showcase the advantageous efficacy of our approach, even with a reduced requirement on quantum resources.
Our model paves the way for application of quantum neural networks to real relevant machine learning problems.
(a) |0⟩
|0⟩
Feature Map (U(X)) Variational Circuit (Uvar (θ))
|0⟩
⊗(n−3)
|0⟩
FIG. 2. (a) Architecture of a QNN acting on n qubits; the data X are loaded with a feature map U(X) and the data are processed
using a parametrized circuit Uvar (θ ). Subsequent measurement allows the parameters θ to be optimized and updated. These
steps may be repeated. (b) A 3-qubit feature map circuit administering the commonly used ZZFeatureMap. Here H represents
the Hadamard gate, P represents the phase gate, X̃i = 2Xi , and Xi j = 2(π − Xi )(π − X j ). (c) A 3-qubit variational circuit with
weight parameters {θ j } explicit.
RN onto the quantum state using a feature map U(X) features. This scaling is problematic on NISQ devices
that can be implemented by a quantum circuit [33]. due to increased error rates and circuit depth, leading to
Next, a variational circuit Uvar (θ ) is applied with in- higher likelihoods of decoherence and computational
telligently chosen single-qubit rotations R(θi ) and en- inefficiency. The combination of these scalability is-
tanglement layers on an input state U(X) |0⟩⊗n , where sues with the gate complexity challenges significantly
the θi parameters are the trainable weights and n is the hinders the practical implementation of QNNs on exist-
number of qubits employed and usually scales linearly ing quantum platforms, making the handling of com-
with N. Finally, the expectation value of some ob- plex, feature-rich datasets a formidable task and pos-
servable (e.g. Ẑ ⊗n ) is measured for the classical post- ing a significant bottleneck in fully leveraging quan-
processing to predict the class ỹ of the input data. Opti- tum computing for advanced machine learning appli-
mizing the weight parameters present in the variational cations.
circuit is done using some classical optimizers, eventu-
ally minimizing the cost function C(y(X, θ ), ỹ) in con-
sideration [7]. A schematic diagram of a typical QNN II. RESULTS
is shown in figure 2.
While this approach of QNNs demonstrates promis- A. The Model: Coherent Feed-Forward Quantum
ing outcomes in certain applications, it faces substan- Neural Network
tial challenges in scalability and gate complexity, par-
ticularly on NISQ devices [16, 34]. The number of Here, we introduce our CFFQNN model that uses
qubits required in QNNs tends to increase linearly a quantum-classical hybrid approach to process data.
with the number of data features, quickly exceeding The source code for all of our work can be found on
the limited qubit capacity of current quantum hard- GitHub as detailed below. The initial encoding layer
ware and thus restricting their applicability to smaller is similar to the first hidden layer of a conventional
datasets, instead of making use of the exponential scal- ANN, as illustrated in Fig. 3, with the classical data
ing of Hilbert space dimension with number of qubits. loaded onto quantum states. Subsequent layers consist
Additionally, the number of controlled-NOT (C-NOT) of a network single-qubit and controlled (entangling)
gates, crucial for creating entanglements in quantum rotation gates, all of which are parametrized by their
circuits, scales nearly quadratically with the number of rotation angles, exemplified in Fig. 4. Ultimately, the
4
qubits undergo measurement. Notably, the parameter- Such single gates are exemplified in the “1st hidden
4
ized controlled rotations are adaptable to cater to spe- layer” segment of Fig. 5(b).
cific network demands and the measurement process Since n is, in principle, independent from N, the 4 size
qubits undergo measurement. Notably, the parameter- Such single gates are exemplified in the “1st hidden
can be tailored based on both the data and
ized controlled rotations are adaptable to cater to spe-
the desired of the circuit need
layer” segment of Fig. 5(b).
not to depend on the number of data
structuralqubits
outcome.undergo This circuitry
measurement. inherently
Notably,
cific network demands and the measurement process the mirrors
parameter- features,
Such unlike
single gates the are case for
exemplified a traditional
Since n is, in principle, independent from N, the size in the QNN.
“1st hidden
ized controlled rotations
the architecture
can be of an ANN,
tailored basedasonarecan adaptable
both betheseen datato cater
by compar-
and the to spe-
desired layer”
Inofthe segment
the of Fig. 5(b).
subsequent
circuit need not layersto depend corresponding
on the number to ofhidden
data
cific
ing Fig. 5(a) network
versusoutcome.
(b).demands and the measurement process Since n is, in principle, independent newfrom N, the
by size
structural This circuitry inherently mirrors layers of an ANN,
features, unlike the we case initialize for a traditionalqubits QNN. applying
can be tailored based on both the data and the desired of the circuit need not to depend on the number of data
We elect the to perform of
architecture allanrotations
ANN, as can about the y-axis
be seen by compar- a parametrized rotation gate
In the subsequent layers as corresponding
a biasing termtoRy;hidden j (q0 j )
structural outcome. This circuitry inherently mirrors features, unlike the case for a traditional QNN.
for reasons ing that
Fig. will
5(a) become
versus (b).clear shortly.
the architecture of an ANN, as can be seen by compar- These are actinglayerson the
of an jth
ANN, qubit,
we then
initialize
In the subsequent layers corresponding to hidden we new apply
qubits parametrized
by applying
ing We
mathematically Fig. elect
5(a) to perform
described
versus by the
(b). allsingle-qubit
rotations about the y-axis controlled
rotation a parametrized
layers of anrotation
ANN, we rotationgates
initialize gatethat as aare
new biasing
qubits byterm
controlled
applyingRy;byj (q0the
j)
for
gate expressed reasons
in the that will
to single-qubit become clear
computationalshortly.
about thebasisThese are
y-axis qubits acting on
in the firstthe jth qubit,
layer. then we
Fora biasing
example, apply parametrized
termifRy;we
We elect perform all rotations a parametrized rotation gate as j (θ0want
j) to
mathematically
{|0i, |1i}foras reasons that will described
becomebyclear the shortly.
single-qubit These rotation
are connect controlled
acting thethefirst
on rotation
jthnode qubit,in gatesthe that
then are
firstapply
we layer controlled
to the first
parametrized by nodethe
gate expressed described in the single-qubit computational basis controlled
qubits inrotation
mathematically ✓by the single-qubit
q q
◆ rotation in the secondthe layer,firstgates
we layer.thatFor example,
apply aarerotation onif the
controlled we
by (n want to
the+ 1)th
{|0i, |1i} as
gate expressed in the single-qubit cos 2 computational
sin 2 basis qubit connect
qubitscontrolled the first
in the firstby node
layer. in the
For example,first layer we want tonode
to the
if qubit: first
Ry (q{|0⟩,
) =|1⟩} exp(as iq sy /2) = in thethe second layer,inwethe state of the 1st
sin q2✓cos cosq q2 sin q ◆ connect first node theapply a rotation
first layer to theonfirst
thenode
(n + 1)th
Ry (q ) = exp( iq sy /2) = 2 2 qubit
in the
17!secondcontrolled by the state of the 1st qubit:
(q )layer, we apply a rotation on the (n + 1)th
θ q
n+1
using the Pauli cossin
syσ. y /2)Sequential cosθ2 q2
sin
2 2−rotations
CR = |0i 1 h0| ⌦I
state ofn+1
+ |1i1 h1| ⌦ Ry; n+1 (q ).
Ry (θ )matrix
= exp(−iθ = θ θ
qubity controlled by the the 1st qubit:
about theusing samethe axisPauli commutematrix and
sin
sy . actSequential
cos
additively
2 as
2 rotations CRy 17!n+1
(q ) = |0i1 h0| ⌦ In+1 + |1i1 h1| ⌦ Ry; n+1 (q ).(5)
17→n+1
CR ⊗ In+1 +connected|1⟩1 ⟨1| ⊗ Ry;by
rotations as The other nodes are1 ⟨0| similarly n+1controlled-
y (θ ) = |0⟩ (θ ). (5)
Ry (q1 )Ryusing
about
(q 2) = theRy (q
the same1 +q
Pauli matrix
axis σy .
2 ). commute Sequential
and act additively i7 j (5)
The other
gatesnodes CRy are !
(qsimilarly), eachconnected by controlled-
In the about
R
first the
y (q1layer,
)Ry (qsame =axis
2 )the y (qcommute
Rdata 1+ q2 ). with
points and act additively
weights are as rotation
Therotation
other nodes are i7! j i j
similarly connected
parametrized
by controlled-
by inde-
gates CR each
weights i7q→i jj. Even though the rotations are con- parametrized by inde-
θ acting are pendent
R y In
(θ 1 the
)R (θfirst
) = R
layer,
(θ 1 the 2of
+ datathe
). points with weights y (q i j ),
encoded as the yrotation
2 y angle Ry gate rotation
pendent gates CRy q(θ
weights ), eachthough parametrized by inde-
In the first
encoded as layer,
the the data
rotation points
angle of with
the Rweights
gate are trolled
acting by different i j i. j Evenqubits
control the
in rotations
different are con-
states, all
on someencoded
initial state,
as the
where
rotation
the
angle
latter
of
is
the
taken
R
y
gate
to be
acting pendent
trolled weights
by θ
different
i j . Evencontrol though the
qubits rotations
in are states,
different con- all
on some initial state, where the lattery is taken to be oftrolled the rotations on a new qubit commute, such that the
some general
on some state Ry (b)|0i.
initial state, Rwhere All theof latter
the data points
is taken to be of the by rotations
different control on a new qubits
qubitincommute,
different states,
such that all the
some general state y (b)|0i. All of the data points aggregate of the effect
rotations onon a a
new node qubit in a hidden
commute, layer
such is
that indepen-
the
are successively
some encoded
general stateencoded
are successively
onto the
Ry (b)|0⟩. same qubit,
ontoAlltheofsamethe data which
qubit, points
which dent aggregate effect on a node in a hidden layer is indepen-
is schematized in Fig. 3. Since all of the rotations from
aggregate anyany
effect ordering a node among
onordering inamong
a hidden the nodes
thelayer in the
the previ-
is indepen- previ-
is schematized in Fig. 3. Since all of the rotations dentdent
are successively encoded onto the same qubit, which
from
from
any ordering among the nodes
nodes
in the
in
previ-
are aboutisare
the samethe
schematized axis, in such
Fig. ansuch
3. encoding
Since ofcanthebecan per- ous layer.
about same axis, an all
encoding rotations
be per- ousous layer.
layer.
formed withare about
a the
single same axis,
single-qubit such an
gate encoding
parametrized
formed with a single single-qubit gate parametrized by can be byper-
q =z=Â formed
qN =X with+Na single single-qubit gate parametrized by
i=1 z= iWÂ iNi=1b:XiWi + b:
θ = z = ∑i=1 XiWi + b: |fi i|fi i ••
Ry (XN WNRR)y·y(X NW
· ·NR
(X y N )2·W
(X · ·2R)R
y (X
y 2W12W
(X (X1yW
)R1y)[R R
1 )[Ry (b)|0i]
(b)|0i] =
WN ) · · · Ry (X2W2 )Ry (X1W1 )[Ry (b)|0⟩] = Ry (z)|0⟩. y = Ry (z)|0i.
(z)|0i.
(3)(3)(3) |0i|0i W0W0 WW
This is the first efficiency resulting from all of the ii
This is the
This first efficiency
is the resulting
first efficiency fromfrom
resulting all ofall the ro-
of the ro-ro-
tationsbeing beingabout
tations being
tations about theaboutsame thesame
the sameaxis,
axis, axis,which
which which
reduces reduces
reduces thethethe
number
number of data-encoding
of data-encoding gates,
gates, and
and alsoalso is responsible
is responsible FIG. FIG. 4. An illustration of a single intermediate node in CF-
number of for
data-encoding
better mimicking
gates,
an
and also
ANN by
is responsible
directly encoding the
FIG. 4.4. An
FQNN Anandillustration
illustration
how the of ofa single
a single
weight intermediate
intermediate
values are appliednode node
oneinnode
in CF-
from CF-
for better
for better variable
mimicking mimicking
an ANN an ANN by
by adirectly directly encoding
encoding thethe FQNN
FQNN and
and how
how the
the weight
weight values
values are are applied
applied from from
one one
node node
to the next using controlled rotations by angle Wi , with a pos-
variable z zwithout without giving
giving preference
a preference to to
thethe ordering
ordering totothe
thenext
next using
using controlled
controlled rotations by angle W , with a pos-
variable zamong
without
amongnodes giving
nodeswithin a preference
withina agiven
givenlayer.
layer.to the ordering sible single-qubit rotationrotations by angle
by bias angle W i. Wi , with a pos-
siblesingle-qubit
sible single-qubit rotation
rotation byby bias angle
bias W0 . W 0.
angle
among nodes within a given layer. 0
Figure
Figure 4 displays
4 displays a single
a single intermediate
intermediate nodenode
in thein the
|0i X0W0 X1W1 X2W2 ··· XN WN Figure
CFFQNN’s 4 displays hidden a layer.
single Itintermediate
illustrates
CFFQNN’s hidden layer. It illustrates how how thenode the in
node the
node
|0i X0W0 X1W1 X2W2 ··· XN WN applies
CFFQNN’s
applies weight weight
hidden values
valueslayer. (denoted
(denoted as
It illustrates W
as W ) to these ) to these
hownodes nodes
the node
FIG. 3. The depiction of data encoding stage. Rotation gates depending
applies
depending weight
on onvalues
the the
value value(denoted
of theof last
the last
as
layer Wlayer
) to
nodes. nodes.
these
After After
nodes
FIG. 3. The depiction of data encoding stage. Rotation gates
withangles
anglesXX W act on a single qubit in analogy with inputs this
this
depending operation,
operation,on the
the the
state
value state
|φ i |f
⟩⊗|0⟩
of i i⌦|0i
the changes
lastchangesto
layer a to
new a
nodes. new
state state
After
with
FIG. 3. The depiction W0 act
0 on a single qubit in analogy
0 of0 data encoding stage. Rotation gates with inputs ↵
acting on a single neuron. An extra rotation gate with X 1 y → j i7! j (q ) |f i ⌦ |0i that may continue to be
i7CR
acting on a single neuron. An extra rotation gate with X0 =0 1 this = ψi operation,
j i= j CRy (θ
= ythe i j )state
|φi ji ⟩ ⊗|fi|0⟩ that
i⌦|0i may
changescontinue to a tonewbe state
with anglesand
X0WW0 = actb on a single
is added qubit in analogy
for flexibility to bias the with inputs
initial qubit. ↵ upon
acted upon
i
and W 0 0= b is added for flexibility to bias the initial qubit. acted by jby other
i7! other control control layers.layers.
acting on a single neuron. An extra rotation gate with X0 = 1 yi jWhile
= CR
While (qi j )ANN
ya classical
a classical
|fi iANN ⌦uses|0iuses thata perceptron
may continue
a perceptron
to be
to decide
to decide
and W0 = b is added for flexibility to bias the initial qubit. acted upon
whether by other
or not control
a certain nodenode layers.
should forward its output
Thisdata
This dataencoding
encodingprocedure
procedurecan canbebe repeated
repeated on on whether or not a certain should forward its output
multipledistinct
multiple distinctqubits
qubitstotoallowallowforformore more nonlinear
nonlinear toWhile
to
the the
nexta classical
next
one, one,
the the ANN
CFFQNN CFFQNN uses a
retains perceptron
retains
the the
branches toofdecide
branches of
This data encoding
processing
processing ofofthe procedure
the can
dataininthethe
data be repeated
quantum
quantum on
circuit.There-
circuit. There- whether
the or not
wavefunction
the wavefunction a certain
associated node
associated with should
the
with the forward
control qubit
control its
qubitoutput
be- be-
multiple fore,
distinct qubits with
fore,aaCFFQNN
CFFQNN to
withallow
n nqubitsfor more
in in
qubits the first
the nonlinear
layer
first cancan
layer be be toing
thein next
ing each of
in eachone,states
of thestates
|0⟩ and
CFFQNN
|0i and |1⟩ without
|1iretains
withoutcollapsing
the the theof
branches
collapsing
described as state via
state measurement.
via measurement. For
the wavefunction associated with the control qubit example,
For example, the branch
the of
branch the
of the
be-
processingdescribed as in the quantum circuit. There-
of the data
state
ing in where
stateeach all
where of ofall
statesthe
of control
the|0i control
and qubits are
qubits
|1i in
without state
are in |0⟩
state only
collapsing |0i only
the
fore, a CFFQNN with n |Ψ⟩ qubits in the
|Yi==[R(z)|0⟩] first
⊗n⌦n layer can be
[R(z)|0i] . . (4)(4) applies the the
biasbias rotation to the target qubits, whilewhile the the
stateapplies
via measurement. rotation For toexample,
the target qubits,
the branch of the
described as
state where all of the control qubits are in state |0i only
|Yi = [R(z)|0i]⌦n . (4) applies the bias rotation to the target qubits, while the
5
branch where all of the control qubits are in state |1⟩ tion value of some operator, which we further use to
applies the gate Ry; j (∑n0=1 θi j ) to qubit j. Overall, the decide the class of the data point as in a classical NN.
action takes the form In our work we often measure the value of the final
! (Nth) qubit ⟨ZN ⟩ as depicted in the “output layer” seg-
O n
U = ∏ CRi7y→ j (θi j ) = ∑ |X⟩⟨X|⊗ ment of Fig. 5(b). The measurement result is fed into a
Ry; j ∑ Xi θi j .
classical nonlinear activation function σ to produce the
ij X j i=0
(6) binary outcome
Here, X = X1 , · · · , Xn is a bit string with elements Xi ∈ (
{0, 1}, the sum runs over all such strings, and we have 1, σ (⟨ZN ⟩) = 1
included X0 = 1 to represent the bias term; the tensor ỹ = . (8)
0, σ (⟨ZN ⟩) = 0
product over j implies that the rotation on the jth qubit
associated with the branch of the wavefunction where
More general outcomes can be considered by either di-
the qubits are in state |x⟩ is by an angle ∑ni=0 Xi θi j ,
viding the ranges of expectation values ⟨ZN ⟩ into more
corresponding to the standard factor in ANNs. This
than two segments or by measuring more final qubits to
structure is schematized in the “2nd hidden layer” seg-
yield more possible final outcomes. The results of such
ment of Fig. 5(b), where the connections between con-
outcomes can be used to update the encoding weights
trol and target qubits are explicit and their weights are
Wi and intermediate weights θi j throughout the net-
given accordingly. All output values of the perceptron
work in an iterative fashion.
are essentially kept coherently and the system is ready
This method can create a complete FFNN like a stan-
to have the process repeated in a subsequent layer.
dard classical one without doing intermediate measure-
Put another way, we have replaced the nonlinear ac-
ments. An overall schematic diagram of a CFFQNN
tivation function σ in a standard perceptron by the con-
with two hidden layers with four and three qubits re-
trolled operations CRy . Unlike σ , the output of CRy is
spectively is shown in the figure 5(b).
not deterministic; it is probabilistic. Nevertheless, if
we measure one of the control qubits, we know we will
find state |0⟩ with probability p(0) = cos2 2z times and
state |1⟩ with the rest of the probability p(1) = sin2 2z , 1. Different Variations of the Model
depending on the value of z = ∑ni=1 XiWi + b. We can
turn such probabilities into binary outputs by simply In our work, we deploy two distinct versions of the
choosing the larger of the two, which are split at the CFFQNN model: the standard CFFQNN and a vari-
value α = π/2: ant that we dub FixedCFFQNN. While FixedCFFQNN
retains the architectural design of the CFFQNN, it di-
( verges in one key aspect: the weights in its initial layer
1, p(1) > p(0) ⇐⇒ z > α remain untrained. This reduction in the number of pa-
ỹ = . (7)
0, p(1) < p(0) ⇐⇒ z < α rameters to be trained significantly speeds the training
process while still outperforming previous QNNs.
In this sense, we have created a coherent QNN that re-
tains all of the properties of an ANN while allowing
for data to be processed without each perceptron being
restricted to a binary output. 2. Hyperparameters of the CFFQNN Model
After repeating the process for multiple layers, the
number of controlled (entangling) gates is given by the The architecture of CFFQNN closely mirrors that of
number of connections between the layers. This is at an ANN, sharing many of the same hyperparameters.
most a quadratic function of the number of nodes per This allows for customization in terms of the number
layer and a linear function in the depth of the neural of layers and nodes within each layer. Additionally,
network, which are expected to grow with the number the measurement scheme can be tailored to fit specific
of features in the data but do not follow a fixed relation- data needs; even the parameter α in Eq. (8) is a hyper-
ship. One can thus create a few-qubit quantum neural parameter. For instance, throughout our research, we
network with n ≪ N and verify empirically its success employed a single measurement strategy for CFFQNN,
for a given machine-learning tasks. measuring only the final qubit. For the FixedCFFQNN,
Finally, after all of the intermediate (hidden) layers, we adopted a partial measurement approach, targeting
we perform a measurement and calculate the expecta- all qubits except the qubits in the initial layer.
6
X0
Bias
Bias
X1
σ(hi0 )
X2 H0
σ(hi1 ) Out
X3
.. H1
. σ(hi2 )
H2
Xi−1
σ(hi3 )
Xi
(b)
1st hidden 2nd hidden
Layer Layer
|0⟩ Wi0 ∗ Xi • • •
|0⟩ Wi0 ∗ Xi • • •
|0⟩ Wi0 ∗ Xi • • •
|0⟩ Wi0 ∗ Xi • • •
FIG. 5. (a) Architecture of an artificial neural network with two hidden layers. Here W represents the weight parameters, X are
data points, σ is a non-linear activation function, and hi j = Wi j Xi . (b) Architecture of a CFFQNN with two hidden layers where
X are data points, and W and θ represent the weight parameters. The number of modes in a layer of the ANN correspond to
the number of qubits in a layer of the CFFQNN. In contrast to earlier QNN models such as those in Fig. 2(c), the parameters of
the CFFQNN change the controlled operations such that the CFFQNN circuits are not solely parametrized by their single-qubit
gates; this is what allows the CFFQNN to resemble an ANN.
quantum algorithms.
0.6
Score
0.4
C. Data and Metrics:
0.2
Scale for R
30 400
21 300
20 18 18
16 16
D. Results on Credit Card Dataset: 14
180 200
162
10
100
For this study, we used the PCA to reduce the dimen-
sion of Credit card dataset to seven features. For both 0
Depth Trainable
Parameters
CNOTs Runtime
0
(a)
1.0
demonstrated the superior performance of CFFQNN
in classification tasks, including a notable variant: the
0.8 FixedCFFQNN. In the latter approach, we did not train
any parameters in the first layer, yet it still outper-
0.6 formed existing QNN models. This untrained variant
Score
40
35 35 36
resources. These characteristics of our model will per-
60
30 Scale for R
suade realization on various quantum computing hard-
21
ware as initial steps toward a scalable implementation
40
20
16 16 for practical application of quantum machine learning.
14
10 20
IV. METHODS
0 0
Depth Trainable CNOTs Runtime
Parameters
some of the features may be highly correlated with After all of the features are uploaded, the next step
each other or may contribute less to the overall vari- of the QNN is a parametrized quantum circuit. Single
ance in the data distribution, a linear transformation of repetition of this consists of at least 2N single-qubit
the coordinates in the feature space can elicit the prin- rotation gates Ry (θi ), N acting on each qubit, sepa-
cipal components, which are the new coordinate axes rated by fixed entangling gates. Each qubit experiences
that account for most of the independent information one parametrized Ry gate, then a sequence of entan-
contained in the features and allow one to neglect axes gling gates ∏i−1i=1 CNOT
N−17→N
· · · CNOT27→3 CNOT17→2
where the data change less. Such principal component is applied, then the process is repeated in alternating
analysis (PCA) is standard in data processing and here fashion and ends with parametrized single-qubit rota-
reduces both 30-feature datasets to seven principal fea- tion gates for a total of 2(N − 1) controlled operations
tures each. These details are summarized in Table I. in the parametrized circuit. All the qubits are then mea-
In the case of the Credit Card dataset, we also ad- sured in the computational basis and the measurement
dressed the issue of class imbalance. To ensure un- result is processed in the same was as for the CFFQNN
biased training and evaluation, we eliminated the ex- detailed below.
cess class instances, balancing the dataset. This step In comparison, the data may be encoded into any
enabled our model to learn from both the minority and number of qubits for the CFFQNN, with more qubits
majority classes more effectively, thereby enhancing its being required for subsequent manipulations that cor-
ability to detect fraudulent transactions accurately. By respond to hidden layers of ANNs. Just like in classical
employing these preprocessing techniques on the se- machine learning, there is no a priori method for deter-
lected datasets, we aimed to create a robust and reli- mining how many layers and how many nodes in each
able framework for evaluating the effectiveness of our layer will be required for the success of training the net-
quantum machine learning model. work for a particular dataset. We choose to encode our
The standard QNN is programmed as follows. Ev- datasets’ seven features into three qubits, correspond-
ery qubit
√ is initialized in the superposition state (|0⟩ + ing to the first hidden layer, process them with a second
|1⟩)/ 2 by means of a Hadamard transformation, then hidden layer comprised by two qubits, then funnel the
the data features are encoded using a phase gate quantum information into a final qubit such that the to-
tal number of qubits is only six.
P(X̄i ) = |0⟩⟨0| + eiX̄i |1⟩⟨1| (9)
The three qubits have the same data redundantly up-
acting on the ith qubit; this is the ZFeatureMap with loaded into them using no entangling operations: the
X̄i = 2Xi and is the first stage of the ZZFeatureMap. operator Ry (∑Ni=1 Wi Xi +b) is applied to each of the first
The ZZFeatureMap then continues to sequentially en- three qubits as in the main text. To process the data and
tangle the ithe and jth qubits and again upload the forward it to the second hidden layer, a controlled op-
same data onto the quantum state, using the sequence eration is required between each pair of qubits from
of gates: the first and second layers, such that twelve opera-
tions of the form CRi7y→ j (θi j ) with unique parameter-
ized weights θi j are applied. A single-qubit rotation
Gi j = CNOTi7→ j [Ii ⊗ Pj (Xi j )]CNOTi7→ j (10) corresponding to a biasing term is also applied to each
qubit in the second layer. Finally, three controlled op-
which uses the controlled-not gate CNOTi7→ j = erations CRyj7→N (θ jN ) are performed between the qubits
(|0⟩⟨0| ⊗ I + |1⟩⟨1| ⊗ σx ) and the nonlinear function in the second hidden layer and the final qubit along
of the parameters Xi j = 2(π − Xi )(π − X j ). All of the with a biasing term on the final qubit, for a total of
qubits are pairwise entangled using a sequence of Gi j 16 controlled operations. The final qubit is measured
operators for various i and j. However, the relationship in the computational basis.
between the number of C-NOT gates and the number For both setups, the single output parameter ⟨ZN ⟩
of qubits is not fixed in a strict mathematical sense, is fed into a classical non-linear function and used to
but rather it depends on the specific architecture of the classify the input data. At least 70% of the available
ZZFeatureMAp quantum circuit and the requirements data points are used and the models are scored on how
of the algorithm being implemented. Here, we used well they correctly predict the classification of those
the circuit with full entanglement option, which re- data. The parameterized circuits are then updated with
quires N(N−1)
2 C-NOT gates to fully entangle all pairs new parameters obtained from the COBYLA [36] op-
of qubits for single repetition of the circuit. timizer and this process is repeated iteratively until the
10
TABLE I. Properties of the datasets used to evaluate the CFFQNN and compare it to existing neural networks.
Datasets Features Features used Training size Testing size Labels
Credit card fraud detection (balanced) 30 7 688 296 2
Breast cancer diagnostic (Wisconsin) 30 7 455 114 2
[1] M. Schuld and F. Petruccione, Machine Learning with [11] F. Tacchino, S. Mangini, P. Kl. Barkoutsos, C. Macchi-
Quantum Computers (Springer International Publish- avello, D. Gerace, I. Tavernelli, and D. Bajoni, IEEE
ing, Cham, Switzerland). Transactions on Quantum Engineering 2, 1 (2021).
[2] M. A. Nielsen and I. L. Chuang, Quantum Computation [12] J. Wang, Y. Chen, R. Chakraborty, and S. X. Yu, arXiv
and Quantum Information: 10th Anniversary Edition, 10.48550/arXiv.1911.12207 (2019), 1911.12207.
10th ed. (Cambridge University Press, USA, 2011). [13] Y. Li, R.-G. Zhou, R. Xu, J. Luo, and W. Hu, Quantum
[3] J. Biamonte, P. Wittek, N. Pancotti, P. Rebentrost, Sci. Technol. 5, 044003 (2020).
N. Wiebe, and S. Lloyd, Nature 549, 195–202 (2017). [14] S. L. Wu, S. Sun, W. Guan, C. Zhou, J. Chan, C. L.
[4] M. Schuld, I. Sinayskiy, and F. Petruccione, Contempo- Cheng, T. Pham, Y. Qian, A. Z. Wang, R. Zhang,
rary Physics 56, 172 (2014). M. Livny, J. Glick, P. Kl. Barkoutsos, S. Woerner,
[5] C. Zoufal, A. Lucchi, and S. Woerner, Quantum I. Tavernelli, F. Carminati, A. Di Meglio, A. C. Y. Li,
Machine Intelligence 3, 10.1007/s42484-020-00033-7 J. Lykken, P. Spentzouris, S. Y.-C. Chen, S. Yoo, and
(2021). T.-C. Wei, arXiv 10.1103/PhysRevResearch.3.033221
[6] V. Havlı́ček, A. D. Córcoles, K. Temme, A. W. Harrow, (2021), 2104.05059.
A. Kandala, J. M. Chow, and J. M. Gambetta, Nature [15] K. H. Wan, O. Dahlsten, H. Kristjánsson, R. Gardner,
567, 209 (2019). and M. S. Kim, npj Quantum Inf. 3, 1 (2017).
[7] D. P. Garcı́a, J. Cruz-Benito, and F. J. Garcı́a-Peñalvo, [16] K. Beer, D. Bondarenko, T. Farrelly, T. J. Osborne,
arXiv 10.48550/arXiv.2201.04093 (2022), 2201.04093. R. Salzmann, D. Scheiermann, and R. Wolf, Nat. Com-
[8] D. N. Diep, Int. J. Theor. Phys. 59, 1179 (2020). mun. 11, 1 (2020).
[9] A. Chalumuri, R. Kune, and B. S. Manoj, Quantum Inf. [17] D. Bondarenko and P. Feldmann, Phys. Rev. Lett. 124,
Process. 20, 1 (2021). 130502 (2020).
[10] B.-Q. Chen and X.-F. Niu, Int. J. Theor. Phys. 59, 1978 [18] I. Cong, S. Choi, and M. D. Lukin, Nat. Phys. 15, 1273
(2020). (2019).
11