E - E A M: F D C G A M: ND TO ND MP Odeling ROM Ata To Ontrollable Uitar Mplifier Odels
E - E A M: F D C G A M: ND TO ND MP Odeling ROM Ata To Ontrollable Uitar Mplifier Odels
E - E A M: F D C G A M: ND TO ND MP Odeling ROM Ata To Ontrollable Uitar Mplifier Odels
A BSTRACT
This paper describes a data-driven approach to creating real-time neural network models of guitar
amplifiers, recreating the amplifiers’ sonic response to arbitrary inputs at the full range of controls
present on the physical device. While the focus on the paper is on the data collection pipeline,
we demonstrate the effectiveness of this conditioned black-box approach by training an LSTM
model to the task, and comparing its performance to an offline white-box SPICE circuit simulation.
Our listening test results demonstrate that the neural amplifier modeling approach can match the
subjective performance of a high-quality SPICE model, all while using an automated, non-intrusive
data collection process, and an end-to-end trainable, real-time feasible neural network model.
1 Introduction
Digital guitar amplifier modeling is the process of recreating the behaviour of analog amplifier circuitry in a computer
program. A successful digital model recreates the amplifier response to user input1 in a way that is perceptually
indistinguishable from the analog reference, both in terms of sound and feel. Therefore, modeling technologies enable a
vast range of users, spanning from amateur to professional musicians, audio engineers, and producers, to conveniently
and reliably have tonal captures of their desired devices. Another benefit of modeling technologies is that they enable a
wide access to analog devices that are scarce and expensive to acquire, i.e., boutique and vintage devices.
In a physical amplifier circuit, complex interactions take place between reactive elements, such as capacitors and
inductors, and non-linear elements, such as vacuum tubes, transistors, and diodes. These non-linear interactions
constitute a dynamic system and are largely responsible for the desired sonic characteristics of guitar amplifiers.
White-box models based on nonlinear circuit simulation aim to directly recreate this complex analog system in the
digital domain [1, 2, 3, 4]. The circuit is represented as a system of nonlinear differential equations, which are solved
using numerical methods. Assuming knowledge of the modeled circuit, these methods can produce accurate emulations,
particularly for solid-state circuits with relatively few components. However, circuits with many reactive elements
coupled with nonlinearities can become prohibitively expensive to simulate in real time. In addition, the behavior of
vacuum tubes and transformers is in most cases difficult to describe or approximate analytically [5].
A more lightweight modeling approach is to approximate the behavior of guitar amplifier stages using a combination of
filtering and waveshaping. In discrete time, such approximation can be realized as an ensemble of linear time-invariant
(LTI) digital filters and static nonlinear functions. Determining the transfer functions of the filtering operations and
tuning the non-linear behavior to the reference device is performed by a combination of careful circuit analysis and
empirical adjustment by signal measurements. Although the filter-waveshaper approach is extendable to block-oriented
gray-box modeling [6, 7, 8], this empirical process can be time-consuming and prone to errors.
1
By user input we refer to the digitized audio signal of the musical instrument performance of the user and to the device-specific
controls that affect auditory characteristics.
E ND - TO - END A MP M ODELING
c y
x
L
f(x, c; θ)
Figure 1: Neural network amplifier model training scheme. A model f receives control positions c as conditioning, and
maps an input signal x to resemble a target output signal y. A loss function L scores the similarity of model output ŷ
and y and provides a learning signal to adjust the model parameters to improve the resemblance.
Previous work on neural network models for guitar amplifiers has proposed high-quality real-time solutions [9, 10, 11,
12], but the research so far has limited itself on learning an input-output mapping at static control settings. Although
there has been initial work on controllable neural amp modeling [13], the results were limited to a single control variable
and relied on a simulated ground truth to turn the control knob virtually.
While the present neural amp modeling techniques manage to capture single control settings, reproducing the full range
of behaviour on varying amplifier controls presents further challenges for digital models. A faithful model should
recreate the behaviour of the amplifier at all combinations of the controls, including potentiometer response curves and
complex interactions between various control settings. Moreover, the combinatorial space containing the various control
settings grows exponentially with the number of controls, presenting a challenge not only for the modeling, but also for
the perceptual validation of the model. Manually verifying the model accuracy at a handful of control positions can
only cover a tiny fraction of the control space in a reasonable amount of time. Therefore, modeling physical amplifiers
at their full range of controls using black-box methods requires an efficient and reliable data collection pipeline.
This paper outlines a data-driven approach to making controllable neural guitar amplifier models. In addition to the
novel problem formulation, the paper describes an automated data-collection pipeline suitable for physical amplifier
devices. The approach is validated by using the data to train a controllable neural network model and comparing the
subjective quality to the reference and an offline white-box SPICE circuit model.
The paper is structured as follows. Section 2 defines the data driven guitar amplifier modeling problem in a concise
form, and expands on the practical considerations of collecting data from real amplifiers. Section 3 briefly describes
neural network architectures suitable for the guitar amp modeling task. Finally, Section 4 outlines the experiments on
training an evaluating a controllable neural network amp model.
These variables are grouped together as a triple (x(i) , y(i) , c(i) ) to constitute the i-th training example in a dataset. That
is, the collection of N training examples forms the dataset D as
n oN
D= x(1) , y(1) , c(1) , . . . , x(N ) , y(N ) , c(N ) . (1)
i=1
2
E ND - TO - END A MP M ODELING
The control values can either be continuous, in the case of representing a knob position, or discrete, in the case of
representing a switch position. Furthermore, the controls can be time-varying, but usually change at a much slower rate
than the audio rate signals.
Our primary model of interest is a neural network that accepts x and c as input, and is parameterized by θ, i.e.,
ŷ = f (x, c; θ). Using a loss function L(ŷ, y) that measures the discrepancy between the model predictions ŷ and
targets y, we utilize standard supervised learning with stochastic gradient descent to learn the network parameters θ
that minimize the average loss over our training set
N
1 X
θ ∗ := arg min L(f (x(i) , c(i) ; θ), y(i) ). (2)
θ N i=1
Given a sufficiently large and representative training set, and an appropriate model, the model generalizes to inputs
outside the training set and learns to interpolate the behaviour of unseen control settings.
Data-driven models depend on high quality data for the best results. In the case of amplifier modeling, adjusting and
recording the control positions requires a degree of precision, consistency, and repetitiveness. Simply, this is not feasible
for a human operator. In contrast, the task is well suited for a robot. As such, we have designed and built a robot for the
very task of automated data collection from guitar amplifiers [14, 15]. Figure 2 illustrates the robot connected to an
amplifier.
As seen in Figure 2, electric motors are attached to each relevant control of a physical amplifier. By controlling the
motors, the knobs on the amplifier can be set to any configuration. Furthermore, the robot is connected to an audio
interface, such that audio can be played and recorded while keeping track of the knob positions. During data set
collection, the robot moves the controls to different positions, plays audio through the amplifier and records the output.
As input audio material to the amplifier, we use a large collection of guitar, bass, and synthetic recordings, which are
randomly sampled for each training segment. To ensure good generalisation, it is important that the source material
exhibits as much variation as possible, while being representative of the different types of signals that are expected to
be played through the reference device or the model.
3
E ND - TO - END A MP M ODELING
The amplifier control space is continuous, but any practical measurements are confined to a finite collection of control
positions. Therefore, we need to first devise a sampling strategy to collect sufficient information about the controls in a
finite number of samples. Perhaps the simplest approach is to sample each knob at a number of discrete positions, and
to go through all possible combinations. However, this fixed grid approach very soon runs into issues with exponential
scaling. For example, let’s consider a case where each knob is discretized to 10 positions. An amplifier with a single
knob, say, a volume control, would only require ten positions, while adding a gain control would increase the number
of combinations to 100. It is not uncommon for amplifiers to have six or more knobs, which would result in over a
million combinations on a regular grid (in general 10n recordings for n knobs). Furthermore, a switch with m positions,
increases the number of recordings by a factor of m.
Not only is the fixed grid approach rigid in terms of number of samples, but it runs a risk of overfitting to the grid points.
Instead, we can break the symmetries of the regular grid and freely choose the number of data points by applying a
random sampling strategy. In a randomized sampling strategy, control positions are chosen from a uniform distribution
for each control, resulting in an unbiased sampling of the overall control space across the whole data set. The number
of data points can be chosen freely to strike a balance between a sufficiently dense sampling of the control space and
resource constraints, such as disk storage and recording time.
When sampling a physical device, we care not only about the density with which we sample the control space, but also
how this process affects the wear and tear of the involved mechanical components, and the time it takes to move the
knobs between different positions. Thus, a strategy which minimizes the total travel required to sample all positions
while balancing the travel per component is needed. To address this issue, we combine the random sampling procedure
with an optimized sorting approach. In particular, all measured control configurations are generated ahead of time, and
a sorting of this list is designed to minimize the overall distance travelled. By further assuming that we start and finish
the recording with all controls at zero, finding the optimal path through random samples becomes a traveling salesman
problem (TSP).
Subject to our application, we first define an appropriate distance measure between the different control configurations,
i.e., different control vectors c. As we are concerned about the overall distance travelled by each component, a
natural choice is the L1 distance. More formally, given the desired number of examples N we compute the matrix of
control-position-wise distances D ∈ RN ×N as
4
E ND - TO - END A MP M ODELING
1.0 1.0
0.8 0.8
0.6 0.6
Gain
Gain
0.4 0.4
0.2 0.2
0.0 0.0
0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0
Volume Volume
(a) Random path (b) Sorted path
Figure 3: Example pathfinding solution for the case of two knobs and 500 data points. The data points are shown as
dots, and the line segments between them show the travelled path for a) random order, and b) sorted order using a TSP
solution.
and derives an output variable yt (e.g., amplifier response to input) from the state
ht = f (xt , ht−1 ), yt = g(ht ). (5)
The amplifier controls can be included as an additional input vector ct to the neural network, allowing a single model to
represent different user control configurations.
The loss function measures the difference between the model prediction and the target signal. A common starting point
is the mean-squared error (MSE)
B T
1 X X (i) (i)
LMSE (ŷ, y) = ||y − f (xt , c(i) )||22 (6)
BT i=1 t=1 t
where T is the number of time steps in a training example, and B is the number of elements in a minibatch. This loss
function corresponds to minimising the energy of the model’s error
(i) (i) (i)
et = yt − f xt , c(i) . (7)
An extension to MSE is to normalize the loss by the minibatch target signal energy, which leads to the error-to-signal
ratio (ESR) loss [13]
PB PT (i) 2
i=1 t=1 ||et ||2
LESR (ŷ, y) = PB PT (i) 2
. (8)
i=1 t=1 ||yt ||2
4 Experiments
4.1 Dataset
For this paper we chose to model the Matchless DC-30™amplifier, which can be considered a modern boutique version
of the classic Vox AC-30. The device is a point-to-point hand-wired tube-powered combo amplifier, and it has five
continuous controls (volume, bass, treble, tone cut, and master volume) which we need to recreate.
5
E ND - TO - END A MP M ODELING
Training examples were recorded with our purpose-built data collection robot. Each example in the dataset consists
of a pair of one second long input-target audio segments, sampled at 48 kHz. The control positions are kept constant
over each segment, and stored together with the audio, as described in section 2.1. Input audio sequences were drawn
randomly from a collection of guitar and bass recordings. The dataset totals around 4.5 hours of paired audio, randomly
split into 15000 training and 1000 validation examples.
To focus on the data pipeline aspect of this paper, we adapt a model already present in the literature to show the
feasibility of the modeling approach. LSTM models have already demonstrated their worth in static snapshot models
for tube amplifiers [9, 20].
To incorporate the conditioning variables to the model, we normalize them to the range [0, 1], and concatenate them
as additional input channels to the LSTM network. Based on preliminary experiments, we found that a single layer
LSTM with 32 cells (denoted LSTM-32) provides a good balance between perceptual quality and real-time cost. For
the listening test, we trained the LSTM-32 model for 1M iterations using the ESR loss (Eq. 8) and the Adam optimizer
[26]. Figure 4 shows a comparison between the model and the reference for an example guitar signal. Figure 5 shows
the effect of the tone cut control to the model response in time and frequency domain.
0
−10
−20
Magnitude (dB)
−30
−40
−50
−60
−70
0.0 2.5 5.0 7.5 10.0 12.5 15.0 17.5 20.0 100 1000 10000
Time (ms) Frequency (Hz)
Figure 4: Comparison of model output (blue) to reference (black) in time domain (left) and frequency domain (right).
−20
Magnitude (dB)
−40
−60
−80
−100
0 1 2 3 4 5 0 2500 5000 7500 10000 12500 15000 17500 20000
Time (ms) Frequency (Hz)
Figure 5: Output from the Matchless DC-30 model with various tone cut settings in time domain (left), and in frequency
domain (right).
As a computationally optimistic white-box model, we recreated the amplifier schematic in LTSpice, which can be
considered an industry standard tool in electrical circuit simulation. This modeling approach requires a high amount of
expert work, as it involves reverse engineering the circuit schematic and measuring each of the electrical component
values on the target device. In addition to the standard LTSpice components, our SPICE model used the Koren vacuum
6
E ND - TO - END A MP M ODELING
tube model [27] with manual adjustment of the tube section input gains to match the saturation curves measured on the
amplifier.
Test set audio samples were rendered with the LTSpice transient analysis tool using a variable step size, and resampled
to a constant sample rate for listening. In terms of fidelity, the present SPICE model represents a best-case scenario for
white box modeling, and is far from real time feasibility – each of test samples took several hours to render. While the
amplifier under investigation is still amenable for offline processing, adding more components (such as tube gain stages
for high gain amplifiers) quickly runs into issues with quadratic scaling in the circuit node connectivity graph.
To evaluate the subjective quality of the amp models, we conducted a Difference Mean Opinion Score (DMOS) [28]
listening test. In the test, a single test case presents the listener with a reference and a test sample, and asked the listener
to rate how closely the test sample resembles the reference on a scale from 1 (bad) to 5 (excellent). The test comprised
five evaluation cases2 , each rated by 30 expert listeners using headphones. The test was conducted remotely using the
WebMushra [29] platform modified for DMOS. Ratings were collected for the LSTM-32 model, the SPICE model, as
well as a hidden reference and a low anchor, which was produced by applying a 3.5 kHz low-pass filter on the reference
samples.
In order to grade the model generalisation to various control positions, the reference amplifier knob positions were
adjusted to fit the musical context for each evaluation case. The SPICE model was adjusted to the test configurations by
measuring the potentiometer resistances to each instance. To ensure consistency, we applied the same guitar cabinet
impulse response on all the systems, and matched the loudness of each sample using the LUFS [30] metric.
Fig. 6 presents the listening test results. Both the SPICE model and the LSTM-32 neural network model achieve
very high subjective quality. A paired T-test with with Bonferroni correction for multiple comparisons indicated no
statistically significant difference between the two models, while other pairings had significant differences.
5.0
4.5
4.0
3.5
3.0
2.5
2.0
1.5
1.0
hor -32 SPIC
E Ref
Anc LST
M
Figure 6: Listening test DMOS rating distributions, including mean ratings with 95% confidence intervals based on the
t-statistic. The results show high ratings for both an offline SPICE model, and a real-time LSTM neural network model
trained using the proposed data driven framework.
2
Readers can make their own judgements on the test material at https://neural-dsp-publications.github.io/
demo-page-2022/.
7
E ND - TO - END A MP M ODELING
5 Conclusion
This paper presents a data-driven approach to creating controllable neural network models of guitar amplifiers. While
previous research on neural amp modeling has mostly constrained itself on static configuration snapshots, the present
work extends the amp modeling problem to a full range of amplifier controls, discusses a practical implementation
of data collection on physical devices, and derives a high quality neural network model from the data. Listening test
results show that the resulting neural network model can match the subjective quality of an offline white-box SPICE
circuit simulation.
References
[1] J. Pakarinen and D. T. Yeh, “A review of digital techniques for modeling vacuum-tube guitar amplifiers,” Computer
Music J., vol. 33, no. 2, pp. 85–100, 2009.
[2] R. Giampiccolo, A. Bernardini, G. Gruosso, P. Maffezzoni, and A. Sarti, “Multiphysics modeling of audio circuits
with nonlinear transformers,” Journal of the Audio Engineering Society, vol. 69, no. 6, pp. 374–388, June 2021.
[3] J. Najnudel, T. Hélie, D. Roze, and R. Müller, “Power-balanced modeling of nonlinear coils and transformers for
audio circuits,” Journal of the Audio Engineering Society, vol. 69, no. 7/8, pp. 506–516, July 2021.
[4] R. Giampiccolo, A. Natoli, A. Bernardini, and A. Sarti, “Parallel wave digital filter implementations of audio
circuits with multiple nonlinearities,” Journal of the Audio Engineering Society, vol. 70, no. 6, pp. 469–484, June
2022.
[5] U. Zölzer, Ed., DAFX: Digital Audio Effects, Wiley, second edition, 2011.
[6] F. Eichas and U. Zölzer, “Black-box modeling of distortion circuits with block-oriented models,” in Proc. DAFx,
Brno, Czech Republic, 9 2016, pp. 39–45.
[7] F. Eichas, E. Gerat, and U. Zölzer, “Virtual analog modeling of dynamic range compression systems,” in Proc. of
the AES 142th Convention, 5 2017.
[8] F. Eichas and U. Zölzer, “Gray-box modeling of guitar amplifiers,” J. Audio Eng. Soc., vol. 66, no. 12, pp.
1006–1015, 12 2018.
[9] A. Wright, E.-P. Damskägg, V. Välimäki, et al., “Real-time black-box modelling with recurrent neural networks,”
in Int. Conference on Digital Audio Effects (DAFx), 2019.
[10] A. Wright, E.-P. Damskägg, and V. Välimäki, “Real-time black-box modelling with recurrent neural networks,”
in Proc. DAFx, Birmingham, UK, 9 2019, pp. 173–180.
[11] A. Wright, E.-P. Damskägg, L. Juvela, and V. Välimäki, “Real-time guitar amplifier emulation with deep learning,”
Applied Sciences, vol. 10, no. 3, 2 2020.
[12] T. Schmitz and J.-J. Embrechts, “Nonlinear real-time emulation of a tube amplifier with a long short time memory
neural-network,” in Proc. AES 144th Convention, 2018.
[13] E.-P. Damskägg, L. Juvela, E. Thuillier, and V. Välimäki, “Deep learning for tube amplifier emulation,” in IEEE
Int. Conference on Acoustics, Speech and Signal Processing (ICASSP), Brighton, UK, 2019, pp. 471–475.
[14] D. A. Castro-Borquez, A. T. Peussa, A. Gotsopoulos, E.-P. Damskägg, L. Juvela, K. E. A. Rauhanen, and T. W.
Sherson, “Robotic system for controlling audio systems,” Feb. 2021, EU Patent Application No. 1156974.4 -
1207.
[15] D. A. Castro-Borquez, A. T. Peussa, A. Gotsopoulos, E.-P. Damskägg, L. Juvela, K. E. A. Rauhanen, and T. W.
Sherson, “Robotic system for controlling audio systems,” Feb. 2021, US Patent Application No. 17/669,797.
[16] N. Christofides, “Worst-case analysis of a new heuristic for the travelling salesman problem,” Tech. Rep.,
Carnegie-Mellon Univ Pittsburgh Pa Management Sciences Research Group, 1976.
[17] E.-P. Damskägg, L. Juvela, V. Välimäki, et al., “Real-time modeling of audio distortion circuits with deep learning,”
in Proc. Sound and Music Computing Conference (SMC), Malaga, Spain, 2019, pp. 332–339.
[18] J. Covert and D. L. Livingston, “A vacuum-tube guitar amplifier model using a recurrent neural network,” in Proc.
IEEE Southeastcon, 2013.
[19] Z. Zhang, E. Olbrych, J. Bruchalski, T. J. McCormick, and D. L. Livingston, “A vacuum-tube guitar amplifier
model using long/short-term memory networks,” in Proc. IEEE SoutheastCon, 2018.
[20] T. Schmitz and J.-J. Embrechts, “Nonlinear real-time emulation of a tube amplifier with a long short time memory
neural-network,” in Audio Engineering Society Convention, 2018.
8
E ND - TO - END A MP M ODELING
[21] A. Peussa, E.-P. Damskägg, T. Sherson, S. Mimilakis, L. Juvela, A. Gotsopoulos, and V. Välimäki, “Exposure
bias and state matching in recurrent neural network virtual analog models,” in Proc. Int. Conference on Digital
Audio Effects (DAFx), 2021, pp. 284–291.
[22] M. A. M. Ramírez and J. D. Reiss, “Modeling nonlinear audio effects with end-to-end deep neural networks,” in
Int. Conference on Acoustics, Speech and Signal Processing (ICASSP), 2019, pp. 171–175.
[23] S. Nercessian, A. Sarroff, and K. J. Werner, “Lightweight and interpretable neural modeling of an audio distortion
effect using hyperconditioned differentiable biquads,” in Proc. ICASSP, 2021, pp. 890–894.
[24] J. D. Parker, F. Esqueda, and A. Bergner, “Modelling of nonlinear state-space systems using a deep neural
network,” in Proc. DAFx, 2019.
[25] C. Tallec and Y. Ollivier, “Can recurrent neural networks warp time?,” in Proc. Int. Conference on Learning
Representations (ICML), Vancouver, Canada, 2018.
[26] D. P. Kingma and J. Ba, “Adam: A method for stochastic optimization,” in Proc. ICLR, 2015.
[27] N. Koren, “Improved vacuum tube models for SPICE simulations,” Glass Audio, vol. 8, no. 5, pp. 18–27, 1996.
[28] I.-T. Rec, “P. 800: Methods for subjective determination of transmission quality,” International Telecommunication
Union, Geneva, vol. 22, 1996.
[29] M. Schoeffler, S. Bartoschek, F.-R. Stöter, M. Roess, S. Westphal, B. Edler, and J. Herre, “WebMUSHRA – A
comprehensive framework for web-based listening tests,” J. Open Research Software, vol. 6, no. 1, 2 2018.
[30] I.-T. Rec, “BS. 1770: Algorithms to measure audio programme loudness and true-peak audio level,” International
Telecommunication Union, Geneva, 2015.