Gesture Analysis of Bow Strokes Using An Augmented Violin
Gesture Analysis of Bow Strokes Using An Augmented Violin
Gesture Analysis of Bow Strokes Using An Augmented Violin
Augmented Violin
Nicolas Hainiandry Rasamimanana
Memoire de stage de DEA ATIAM annee 2003-2004
Universite Pierre et Marie Curie, Paris VI
Laboratoire daccueil : Ircam - Applications Temps Reel
Responsable : Frederic Bevilacqua
Contents
Abstract
vii
R
esum
e
ix
Acknowledgments
xi
Introduction
xiii
. . . . . . . . . . .
. . . . . . . . . . .
. . . . . . . . . . .
bow on the strings
. . . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
1
1
2
4
7
7
7
8
9
10
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
17
17
17
18
18
18
19
19
19
ii
CONTENTS
4.3.3
Velocity computation . . . . . . . . . . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
20
.
.
.
.
.
.
.
.
.
21
21
21
22
22
23
23
24
24
25
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
overlapping
. . . . . . .
. . . . . . .
33
33
34
35
35
36
37
38
39
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
41
List of Figures
2.1
2.2
2.3
5.1
Accelerometer signals in the bowing direction after substraction of static acceleration. The signals are biphasic between
positive and negative values. Some dierences can be observed in the amount of deceleration, its time repartition and
its occuring moment. The abscissae represent the sample
number with sampling frequency 250Hz. . . . . . . . . . . .
23
Acceleration, integrated speed and integrated position signals. Speed and Position are integrated by summing the acceleration samples with a zero oset. The abscissae represent
the sample number with sampling frequency 250Hz. . . . . .
27
28
29
5.2
5.3
5.4
iii
iv
LIST OF FIGURES
5.5
5.6
6.1
6.2
6.3
6.4
6.5
30
31
34
36
37
38
39
List of Tables
3.1
4.1
4.2
4.3
14
19
19
19
vi
LIST OF TABLES
Abstract
At Ircam, we are currently developing an augmented violin. Such an augmented instrument appears to be attractive in computer music because it
oers a large diversity of sounds and nuances. In order to study how this
ability can be used to have a subtle, continuous control on musical processes
(e.g sound synthesis), we have placed a gesture sensing system on a violin
carbon fiber bow. With the help of this equipment, we have particularly
analyzed the bow speed, which is one of the most influencing parameter on
sound in the playing of a bowed string instrument, according to instrument
acoustics. In this study, we have notably focused on four dierent types
of bow strokes: detache, martele, pique and spiccato. Their gestural data
analysis has resulted in the constitution of a set of features corresponding
to the interest points in the speed curve, i.e extrema and inflection points.
We have tested these features on a bow stroke database that we built using our segmentation algorithm on recording measurements. The features
show some strong invariance properties for a single violinist and between
two dierent violin players including Jeanne-Marie Conquer. The features
behavior is also pertinent according to gesture variations, especially when
changing nuances and tempo. Moreover, this feature space is consistent with
acoustics studies having shown the influence of bow speed on sound spectral
characteristics: the features being extracted from the speed temporal curve,
we can deduce some perceptual properties of the space generated by the
features.
vii
viii
ABSTRACT
R
esum
e
Nous sommes actuellement en train de developper un violon augmente `a
lIrcam. Un tel instrument suscite un attrait particulier dans le domaine de
linformatique musicale parce quil ore un large eventail de sons et de nuances. De mani`ere `a etudier comment cette particularite peut etre utilisee
pour controler finement et de mani`ere continue des processus musicaux,
nous avons place un syst`eme de captation du geste sur un archet de violon en fibre de carbone. A laide de ce syst`eme, nous avons en particulier
etudie la vitesse de larchet, lun des param`etres les plus influents sur le son
dans le jeu dun instrument `a cordes frottees, dapr`es lacoustique instrumentale. Dans cette etude, nous nous sommes notament interesses `a quatre
types dierents de coups darchet : detache, martele, pique et spiccato.
Lanalyse de leur donnees gestuelles a permis de constituer un ensemble de
descripteurs correspondant aux points dinteret de la courbe de vitesse, i.e
les extrema et les points dinflexion. Nous avons teste ces descripteurs sur
une base de donnees de coups darchet que nous avons construite en utilisant
notre algorithme de segmentation sur des enregistrements de mesures. Les
descripteurs montrent de fortes proprietes dinvariance pour un et deux violonistes (dont Jeanne-Marie Conquer). Le comportement de ces descripteurs
est egalement pertinent en terme de variations sur les gestes, en particulier
dans les changements de nuances et de tempo. De plus, lespace engendre
par ces descripteurs est compatible avec les resultats detudes acoustiques
montrant linfluence de la vitesse darchet sur les proprietes spectrales du
son : les descripteurs gestuels etant extraits de la courbe temporelle de la
vitesse, on peut en deduire des proprietes perceptuelles pour lespace engendre par ces descripteurs.
ix
RESUM
E
Acknowledgments
I would like to acknowledge:
Norbert Schnell, who welcomed me in the Applications Temps Reel research team,
Frederic Bevilacqua, my supervisor, for the fruitful discussions we had,
Emmanuel Flety, for the tremendous work he did to build the augmented
violin prototype,
Alain Terrier, who made the carbon clip we use to fix the electronics
boards on the bow,
Diemo, for always answering my questions on Matlab and LATEX.
Gerard Assayag and Cyrille Defaye, for their availability throughout the
year.
I would also like to acknowledge the artists who contribute to the project,
in particular the composers Philippe Manoury, Florence Baschet, Franck
Bedrossian and Jerome Combier and address a special thank to professional
violinists Hae Sun Kang and Jeanne-Marie Conquer for their gesture recordings.
And last but not least, I would like to acknowledge all my classmates
from DEA ATIAM for their kindness, with a special note to Arshia, my
oce-mate, who was forced to bear me more than the others. Arshia, you
are right, we had a nice time.
xi
xii
ACKNOWLEDGMENTS
Introduction
This work was done in the context of Ircams interest in movement analysis.
The objective is to find some dierent means of interaction with a computer
that would involve gestures. Here, we focus on the study of the violin bowing
techniques. The subtleties induced in the playing of a bowed instrument
should lead to the creation of a pecularly rich control interface. This research
project is also motivated by a group of composers, including notably Philippe
Manoury, Florence Baschet, Franck Bedrossian and Jerome Combier, who
intend to use the results in future pieces and who, as a consequence, actively
participate in its development.
The gesture analysis is performed via a sensing system mounted on a
carbon fiber violin bow. A number of various signals, ranging from the bow
position to the downward force on the strings, are then issued and taken as
input. There are two distinct ways of using these continuous signals. The
first one consists in a direct use of the data with a minimal interpretation:
it can then serve as parameters to control a physical model synthesis, e.g.
that of a bowed string. The other way of using the input tries to interpret
the signals in order to determine some higher level information, e.g. the
performed bow stroke is staccato.
The former way of use is rather straightforward in that the main difficulty resides in feeding a physical model synthesis module with the right
parameters, while the latter demands deeper analysis but provides a better understanding of the sensor signals behavior. Musical applications can
therefore take advantage of this knowledge and for example follow the interpret gesture in the same way as score following do with audio, which is
of main interest to trigger events in mixed pieces, or anticipate the violinist
movements.
These two applications pose the problem of identifying objects in the
violinist gesture and in the sensor signals. In the same manner that score
following relies on tones, which characteristics are notably pitch and duration, we here have to determine what is to follow or to anticipate. Therefore,
from these continuous sensor signals, we have to extract pertinent pieces of
information on the bowing. In other words, it would be interesting to eventually try to identify patterns and invariants.
The work done here focuses on the study of a set of dierent bow strokes
xiii
xiv
INTRODUCTION
Chapter 1
Introduction
A music instrument can be defined as the meeting between art and technology. However, technology and more pecularly computer science has known
an exponential growth during these last fifty years, and so the desire of
creating new instruments that would exploit these developments. These instruments, often called Digital Music Instrument (DMI), can be divided into
two categories: the new music interfaces and the augmented instruments.
The first category involves new controllers made from scratch using various sensors, e.g. accelerometers or force sensing resistors, and use the gestural data with a low level interpretation by mapping them to sounds. This
approach is particularly attractive as it generally oers a straightforward
interface. However, the problem of a long and complex learning found with
a traditional instrument is often replaced by the searching of appropriate
mappings between gesture and sound, as these correspondences are totally
opened, which can be seen as hard a task to do. Moreover, these new interface simplicity turns out to be one of their main drawbacks too. It indeed
often means a poorer expressive interface. Another important point is the
presence of a haptic feedback, which plays an important role in the relationship between an interpret and its instrument. Indeed, the consideration of
the physical response when a musician performs a gesture is a determinative
element in the mastering of his/her instrument. In this first approach, this
aspect is still under development at the moment in order to oer something
comparable to the feeling of an instrument. However, there are already a
number of very interesting and fascinating works, like the worth seeing on
stage performances of Atau Tanaka with the BioMuse [18]. Bioelectrical signals, and particularly electromyograms of his arm muscles, are digitized and
mapped to sounds and images. Therefore, the movements of his body are
directly interpreted to create music. Atau Tanaka underlines that although
BioMuse is not a mechanical instrument because there is no material object
1
1.2
by Tod Machover. In this piece, the HyperBow is used to control the activation and the alteration of sounds and eects on an electric violin, using
a selection of chosen gestures. The HyperBow has also been used to evaluate the playability of various physical models of friction based instruments,
including a violin [26], [16], a Tibetan singing bowl, a musical saw, a glass
harmonica and a bowed cymbal [17]. Here, means to expressively control
these models and involving the HyperBow are investigated.
In 1998 [13] and 2000 [14], Bernd Schoner and al uses probabilistic techniques to infer, in real-time, violin sounds from the gesture input given
by the HyperCello bow, the previous version of the HyperBow. He uses a
cluster-weighted modeling to predict the sound pitch and amplitude from
the gestural data, and tries to extend the inference model to spectrum with
rather good results in sustained part but with mitigated results in the transitions.
In 1999, Perry Cook and Dan Trueman [19] built a new instrument based
on the dierent elements constituting a violin, the BoSSA (Bowed-SensorSpeaker-Array). Its bow, the R-Bow, is very close to that of Tod Machover:
it is a standard violin bow fitted with pressure sensors (force sensing resistors, FSRs) and a dual axis accelerometer which measures both angle
position and changes of velocity. The BoSSA Fangerbored is a fingerboard
augmented with a linear position sensor, four FSRs to use by the right
hand to trigger events, and another dual axis accelerometer. The strings
are replaced by an array of four pieces of foam-covered wood, sponges,
resting freely between two fixed FSRs and that can be bowed as real strings.
What serves as a resonating body is a spherical multi-channel speaker arrays, which can reconstruct the radiative timbral qualities of violins in a
traditional acoustic space. The proximity between input and output grants
BoSSA with a playing similarity to an acoustic instrument with the flexibility of software synthesis and signal processing techniques. The BoSSA was
used in The Lobster Quadrille, a piece composed by Dan Trueman, where
the bow gesture data is used to control a comb filter vibrato.
In 2001, Camille Goudeseune with his eViolin [5] tracks the violinist
movements by means of an electromagnetic field. The sensing system is
composed of two sensors placed on a five-string electric violin and of an
antenna emitting a time-varying magnetic field and placed about one meter
across. The spatial position of the violin is mapped to timbre according
to some perceptual dimensions: spectral brightness as a function of latitude
and spectral richness as a function of longitude. The third spatial dimension
is not mapped to a third perceptual dimension because of the inconvenience
of playing at abnormal altitudes. It is rather used to toggle switch (octave
switch) or as a continuous scaling factor for the amount of reverberation for
example. It is to be noted that in the eViolin applications [7] the electric
violin is associated to a output array of speakers similar in eciency to that
of BoSSA [19] in an apparently cheaper version.
Since 2000, Charles Nichols [10] has developed two versions of a virtual
violin bow haptic human-computer interface, the vBow. The bow is custommade of fiberglass and is linked to servomotors in order to sense the bow
position and bring a haptic feedback. The data is then used to drive a
physical model synthesis.
In his CyberViolin project (2003), Chad Peiper and al [11] uses an electromagnetic system to record the position of two sensors mounted on a violin
bow. He oers a higher level interpretation of the gesture data as he extracts
some features to classify the violin dierent bow strokes using a decision tree.
The features he uses are the bow position at the beginning and at the end
of the stroke, the bow speed, the frequency of bow change, the acceleration
or deceleration within a stroke, the continuity of motion between strokes,
the lack of movement within a stroke. These features are then provided to
a decision tree for training and recognition. The performances of his classification process are actually limited by the inaccuracy of the sensing system
(sensor errors, resolution and sampling frequency) and should be improved
by additional features in the decision tree. The CyberViolin interpreted
data are used in a 3D graphical environment in two ways: as a representation of the performed bow strokes, which grants the violinist with a real
time feedback, and as a means to interact with a program using the bow
instead of a traditional pointing device.
1.3
Ircam has a strong background in mixed pieces, where an acoustic instrument and a computer perform together. Philippe Manourys piece Jupiter
for flute and real time electronics, composed in 1987, is a pioneering work
in the field of interaction between a live instrument and a computer. The
4X, which was designed by Giuseppe Di Giugno, used the audio and score
following to interact with the interpret while the flute, conceived by Larry
Beauregard, was augmented so that its fingering could be detected and the
instrument used as a control input device.
Several studies have already been carried out at Ircam on gestural control, with some major works by Marcelo Wanderley in [20] and [21]. In
2000, Emily Morin studied the similarities and dierences between dierent
cellists way of bowing using a DataGlove, [9] and [8]. The FSRs located
on each contact points between the right hand and the bow showed some
repeatable patterns that could be used in a recognition process, although
the pressure of the fingers on the bow stick is not very reliable a parameter.
In 1996, Suguru Goto built le SuperPolm with the collaboration of
Patrice Pierrot. Le SuperPolm is a control interface that is based on a
violin but has no strings nor hair bow. It is played in a similar way to a
violin so that the body movements can be recorded and used as input data
Chapter 2
2.1
2.1.1
Position Sensor
The bow positions (to the bridge and from tip to frog) are deduced from
the same electromagnetic position sensor. The method used is based on
a capacity coupling between the bow and a square-shaped antenna placed
behind the violin bridge. To do so, the bow is covered with a resistive
material, here a piece of the magnetic ribbon of a video tape, that runs
alength the stick. Two electric signal are sent at each extremities of the bow
at dierent frequencies (50kHz and 100kHz) and are gradually attenuated
along the bow stick by the resistance. The tip and frog signals are received
in a single electrical signal and demodulated with a low-pass and a high pass
7
Figure 2.1: The violin sensing system. The sensing system is mounted on a
carbon fiber bow. The clip under the bow contains the RF transmitter, the
electronics board holding the micro-controller, the accelerometers. The FSR
is placed on top of the metallic grip. The position sensor system includes
the resistive tape (starting from the copper clip) and the antenna mounted
behind the bridge.
filter.
The positions are then computed according to the strength of the two
signals. Ideally, both electric signals should decrease linearly with the same
slope along the resistance. With this assumption, the bow-bridge distance
and the tip-to-frog distance can be deduced by the following equations:
bow bridge dist =
tip f rog
tip + f rog
1
tip + f rog
and
where tip [resp. frog] is the strength of the signal emitted from the tip [resp.
frog].
2.1.2
Acceleration Sensor
The physical principle based behind an accelerometer is that of a massspring system 2.2. For each axis, the accelerometer therefore measures the
displacements of the mass from its rest position, i.e. spring not stretched nor
compressed. The frictions are compensated for by some sophisticated signal
conditioning circuitry present in the device. Therefore, the acceleration of
the mass is roughly proportional to its displacement.
2.1.3
10
2.2
The collected data recorded by the sensing system needs to be digitized and
sent to a workstation. The accelerometers and the FSR data are sent to a
sensor acquisition system via a radio frequency (RF) transmitter (see figure
2.3), while the position data received by the antenna behind the violin bridge
is sent via a cable: that way, the augmented bow remains wireless, which is
of most importance in order to preserve the violinist way of playing.
Figure 2.3: The electronics components mounted on the carbon fiber bow.
The digitization device, EtherSense, has been developed by Emmanuel
Flety and al [3]. EtherSense enables the digitization to be performed on 16
bits at 250Hz for the RF transmitter data and at 1000HZ for the position,
while in Diana Young system [24], the data are digitized on 8 bits at 41Hz
for acceleration and strain and at 142 Hz for position.
The data is then sent via ethernet to Max/MSP using the OSC protocol,
where we can record simultaneously audio and gesture data. In a second
11
12
Chapter 3
3.1
14
Martele
Pique
Spiccato
Staccato
Ricochet
Jetato
Saltelato
Tremolo
Description
Each note is played in a separate stroke, with a rather constant speed alength the bow and with a more or less smooth
attack
Strong acceleration at the beginning of each strokes with an
abrupt stop between them, which give the stroke a sharp
almost percussive attack.
Each note is preceded by a pressure on the bow, an accent
is given to the note, the attack is sharp but smoother than
for Martele. The bow may leave the string.
Each note is attacked from above the string. The bow describes a sort of parabole and strokes the string when arriving at the parabole lowest points. The violinist plays
spiccato around the bow equilibrium point, i.e. around the
first third from the frog.
Succession of accented notes in one bow, with the bow stopping briefly between two notes. Flying staccato implies the
bow leaving the strings and is usually used in arpeggios.
The bow is released from above the strings and bounces
with decreasing intensity and time intervals according to its
physical properties.
The bow is released from above the strings but bounces with
constant time intervals so as to achieve a given rhythm.
The violininst performs a short detache bowing around the
equilibrium point, i.e. around the first third from the frog,
and the bow naturally bounces under speed. Requires a
minimum speed so that the bow can bounce.
Each note is divided in several very short bow strokes played
detache, without the bow bouncing o the string.
3.2
15
Describing the dierent bow strokes is not an easy task. Indeed, there are
dierent schools of violin technique, and therefore dierent ways of performing a bow stroke. The schools do not agree on the terms to use and
therefore on the bow stroke classifications. For example, pique and spiccato
may be considered as a same bow stroke. On top of that, a same violinist
can perform a bow stroke with many sublte dierences for some expressive
reasons.
Nuance plays a significant role in this variability. Indeed, a quarter note
detache forte will be played with a higher bow speed and therefore with
more bow length than the same note piano, which is of first importance
considering our sensing system. In the same manner, the tempo influences
the gesture to perform for a given bowstoke. More generally, we can say
that the context of a musical phrase can significantly modify a bow stroke,
modification totally controlled by expert players.
However, without the data of the musical context, where can be the
delimitation between martele piano and pique forte? There seems to be no
clear frontier between the dierent strokes. Therefore, our representation
space should reflect the possibility to continuously go from one bow stroke
to another, e.g. from detache to martele or spiccato.
The notion of invariance is subjacent to this variability. We therefore
would be interested in extracting the essential information relative to the
execution of a bow stroke, which would be common to dierent violinists. To
what extent can this be done, considering that each violinist has a dierent
body constitution and has adapted his/her technique to it. Violin acoustics,
which study the interactions between the bow and the strings and the consequences on the sound spectrum, may bring some elements of response to
this invariance issue.
3.3
The most accepted model of the bowed string is the Helmholtz kink motion.
When a bowed string oscillates in steady state, the string can be modeled
by two straight lines connected by a kink that rotates in a parabolic path.
The interaction between the bow and the string switches between two states:
stick and slip. The sticking phase corresponds to the interval when the kink
moves between the bow and the nut and during which the string is stuck to
the bow hair and therefore takes the speed of the bow. The slipping phase
corresponds to the interval when the kink moves between the bow and the
bridge and during which the string moves in opposite direction to that of
the bow.
In 1973, J.C Schelleng [12] established a diagram representing the region
16
Chapter 4
4.1
We have seen from the studies on acoustics that the parameters of interest
include bow speed, bow-bridge distance and bow force on strings. The sensing system should therefore measure these quantities. However, the sensors
are not perfect as they may not directly give access to the desired parameter,
add noise and have a definite resolution. This part gives a quantification of
the sensor performances in the context of the augmented violin bow.
4.1.1
In chapter 2, we have seen that the physical principle ruling an accelerometer is that of a mass-spring system (one for each axis): the sensor measures
the displacement of a mass connected to a spring in a specified direction.
In consequence, the data given by an accelerometer is not a true measure of
the system changes of speed. Indeed, it also measures the angle made by the
accelerometer axis and the gravity direction, often refered to as static acceleration: this term is debatable but as it is commonly used in accelerometer
technical specifications, we will used it. The variations of speed are called
dynamic acceleration. A high-pass / low-pass filtering cannot separate them
in all cases, as a violin player can change string during a bow stroke or
perform several bow strokes on a single string. In the first case, static acceleration evolves faster than dynamic acceleration while in the second it is
the contrary. This double measure is a direct consequence of the physical
properties of the accelerometer, and is problematic as the data given by
this sensor is the combination of two unknowns: measuring the acceleration
17
18
4.1.2
The main issue here is the implementation of the position sensor. We need
to build a homogenous resistance of the bow stick size (60cmx5mm), with
a suciently low impedance so that the tip and frog signals can gradually
decrease alength the bow, and suciently high so that the bow does not
radiate. The magnetic ribbon made from a video tape still needs some
adjustments in order to satisfy this double constraint.
In practice, the electric signals do not decrease linearly along the bow
length, as would be wanted to compute the tip to frog distance and the bow
bridge distance using the equations given in chapter 2, but have a strong
exponential decrease so that each signal amplitude is no more significant
beyond the middle of the bow. Therefore, both bow positions are problematic to compute for the moment, because we either get an electric signal
from only one extremity (when next to the tip or the frog) or we get both
electrical signals with low and noisy amplitude (around the middle of the
bow, for about 10 cm).
4.1.3
We chose to measure the force exerted by the forefinger on the bow stick
in place of the force bow on strings by means of a force sensing resistor.
The weakness of this system is that the FSR signal is not highly correlated
to the force bow on strings: one can exert a downward force by explicitly
using the forefinger, but at the same time, one could exert a downward force
without the help of the forefinger but using the other fingers and the weight
of the hand. Moreover, this parameter varies significantly, even between
violin players of same skills, according to the technique they developed from
their body constraints and their instruments.
4.2
Noise estimation
19
V ar
33.8
29.6
28.9
SN RdB
25.3
27.0
28.0
Error (%)
0,29%
0,19%
0,16%
Comments
vertical bow, tip up
horizontal bow
vertical bow, tip down
4.3
4.3.1
In table 4.2, we have computed the eective bit resolution for static acceleration, i.e angle of inclination between the bow stick and gravity, according
to the signal range and noise. The error value corresponds to the maximum
error measurable with the bow being still. As a matter of fact, the resolution
is under-estimated.
M in
11500
M ax
18300
Range
6800
Error value
33
Resolution
206
Table 4.2: Range and resolution for static acceleration in the accelerometer
bowing direction
4.3.2
Dynamic Acceleration
M ax
30000
Range
30000
Error value
29 to 33
Resolution
900 to 1035
Table 4.3: Dynamic range and resolution for dynamic acceleration in the
bowing direction
20
4.3.3
Velocity computation
Acoustics has shown that there is a relationship between bow speed and
sound spectrum, which makes it a parameter to consider in the study of
bow stroke variety. There are two ways to compute this speed: from the
position signal or from the accelerometer signal. Deriving the position is
the most natural way of obtaining the bow speed. However, we have seen
the technical issues in building an accurate position sensor and the need for
some more adjustments. Integrating the acceleration poses the problem of
the unknown initial speed. The other problem is the accelerometer signal
being the combination of both static and dynamic accelerations.
In spite of these dierent problems, it is still possible to extract some
information on the bow speed. Even if the tip and frog signals do not decrease as would be expected, we can still derive them. However, the accuracy
of this operation remains to quantify. The integration of the acceleration
signal with a zero oset can give the global shape of the speed waveform.
However, the integration of numeric signals is done by summing all samples,
which can result in an error accumulation. In addition, this technique is
also limited by the violinist not changing static acceleration during a bow
stroke: a change implies a shift of the data that does not correspond to a
speed change.
Chapter 5
5.1
5.1.1
Signal Models
Acceleration
22
5.1.2
5.1.3
23
5.2. SEGMENTATION
6000
4000
3000
5000
2000
4000
1000
3000
1000
2000
2000
1000
3000
0
4000
1000
50
100
150
200
250
300
5000
50
(a) Detache
100
150
200
250
(b) Martele
5000
2500
2000
4000
1500
3000
1000
2000
500
1000
500
1000
1000
1500
2000
3000
2000
50
100
150
200
250
300
2500
(c) Pique
50
100
150
200
250
300
(d) Spiccato
5.2
5.2.1
Segmentation
Segmentation objectives
The sensing system records the violin player bowing gestures. Therefore,
we receive a set of time series corresponding to each sensors and from which
we try to extract information. In the recordings we made, each bow stroke
is played repeatedly. The very first task that needs to be done in order to
characterize the bow strokes is to segment the time series according to each
24
5.2.2
The most straighforward idea is to automatically segment the data according to the speed signal. Indeed, whatever the bow stroke, the bow speed
shifts from +speedb to speedb between downbow and upbow. As there is
no speed sensor on the system, it has to be computed either from the position sensor signal, or from the accelerometer signal. As we underlined, both
approaches have their diculties. We also discussed the position sensor implementation issue, i.e. the building of a resistance of the bow size with the
right impedance. Thus, the first analysis we performed were done without
the position sensor, which needed improvements. In consequence, we focused on the accelerometer signal in the bowing direction, and implemented
a segmentation algorithm based on thresholds and peak detection.
The accelerometer signal is the combination of static and dynamic accelerations. This particularity is a problem if we want to use a threshold
method to segment the data. Indeed, a high-pass / low-pass filtering cannot separate them in all cases, as a violin player can change string during
a bow stroke or perform several bow strokes on a single string. In the first
case, static acceleration evolves faster than dynamic acceleration while in
the second it is the contrary.
Using the acceleration signal, we did not try to implement a threshold
algorithm that would have worked in all cases because this would have demanded many heuristics and therefore be little robust. We instead reduced
the possible cases by asking the violin player not to change string during
the recording of a bow strokes series. This choice did not prevent us from
performing measures on all strings but eased the substraction of static acceleration.
5.2.3
Segmentation procedure
We already observed dierent behaviors on the dynamic acceleration according to the bow strokes. Downbows are have an acceleration peak at the
beginning, are biphasic between positive and negative and some show strong
deceleration peaks, and inversely for upbows. Therefore, the segmentation
algorithm must consider this deceleration peak as a part of downbow and
not consider it the beginning of upbow 5.4.
The segmentation algorithm is a two step process. It first thresholds
the signal according to a user given value in order to cancel the signal lowest values variations keeping the sharp peaks. Then, by dierentiating the
time instant array returned by the thresholding process, we can find the
5.3. FEATURES
25
bow stroke change instants with another threshold value related to the minimum interval possible between two peaks: this value depends on tempo and
therefore must be chosen of the same order of magnitude.
The bow strokes are all performed on a single string. We chose to estimate static acceleration as the mean of the accelerometer output over a
window containing 10 to 100 bow strokes. This estimation is convenient
because the same blind treatment can be applied to all bow strokes. Figure
5.5 shows the times series for bow strokes performed at 60bpm. Static acceleration has been removed by substraction of the mean value over the whole
window. The first column corresponds to the raw dynamic acceleration, the
second column is the filtered, thresholded acceleration with a hann window
of size 64 (which corresponds to 250ms) and the last column is the filtered,
thresholded acceleration with the markers at each bow change.
Because of the detache, martele, pique and spiccato dierences in the
dynamic acceleration signals, careful adjustments had to be done on the
threshold values for each of them. However, once the appropriate range of
thresholds is found, the algorithm can segment the data at dierent nuances
and tempi, and is very useful to build a larger bowtroke database than with
a purely manual segmentation.
The main objective of this segmentation algorithm is to help us consitute
bigger databases in order to study the bow strokes invariance and variability.
However, segmenting the bow strokes is already a relevant operation considering the dierent applications. Indeed, a real-time version of a robust
segmentation algorithm can be used to track the interpret movements in a
score following like application, or can be used to trigger events in a mixed
piece. Having the bow speed zero crossing may be of great help to achieve
segmentation in real time.
5.3
Features
From the signal modelisation of the bow strokes, we extracted six simple
features, relative to the bow speed and its variations. Considering the dynamic acceleration on a window corresponding to the stroke duration, we
compute:
- the maximum acceleration, amax
- the minimum acceleration, amin
- the time of maximum acceleration, tmax
- the time of minimum acceleration, tmin
- the speed after maximum acceleration : maximum speed, v1
26
These features are simple. Considering the speed evolution curve during
a bow stroke, they correspond to the most common analytic parameters,
in a mathematical sense, that can be extracted (extrema, inflexion points
and time interval) (fig 5.6). They impose some geometric constraints on the
speed temporal curve.
From a violinist and physical point of view, the features can be interpreted as follow:
- v1 is the speed of attack
- v2 is the speed after the attack
- amin and amax are relative to the sharpness of the attack
- tmin and tmax are relative to the time of the attack
Another point is that these features are all extracted from the shape
of the bow speed. We have discussed the correlation of these parameters
with the audio signal. These features therefore appear to be relevant in our
interest in the subtle timbres and nuances variations that can be produced on
a violin. We can also notice that since they have a physical interpretation,
they conceptually make sense in terms of playing technique from a violin
player point of view. We combined these features in order to consider relative
values instead of absolute values. The set of combined features is therefore:
- the maximum speed, v1
min
- the normed speed, v2 | aamax
|
27
5.3. FEATURES
x 10
16
x 10
6000
14
12
10
5000
4000
3000
2000
1000
1000
50
100
150
200
250
300
50
100
150
(b) Detache
speed
200
250
300
integrated
50
100
150
200
250
300
14
x 10
12
x 10
4000
3000
12
10
2000
10
8
1000
8
0
6
6
1000
2000
4
3000
2
4000
5000
50
100
150
200
250
50
100
150
(e) Martele
speed
200
250
integrated
50
100
150
200
250
x 10
10
x 10
5000
6
4000
8
5
3000
6
4
2000
1000
1000
2000
3000
50
100
150
200
250
300
50
100
150
200
250
300
100
150
200
250
300
50
x 10
x 10
2500
2000
1500
5
4
1000
4
500
3
3
2
500
1000
1
1
1500
0
2000
2500
50
100
150
200
250
300
50
100
150
(k) Spiccato
speed
200
250
integrated
300
50
100
150
200
250
300
28
x 10
0
26 4
x 10
0
26.5
27
27.5
28
28.5
0.5
1.5
2.5
29
1
2
0
4
x 10
4
26.5
27
27.5
28
28.5
29
5000
26
x 10
2
1
0
33.2 4
x 10
0
33.3
33.4
33.5
33.6
33.7
33.8
33.9
34
0.1
0.2
0.3
0.4
0.5
0.6
0.7
33.3
33.4
33.5
33.6
33.7
33.8
33.9
34
33.3
33.4
33.5
33.6
33.7
33.8
33.9
34
1
2
0
4
x 10
10
0
26
5000
audio energy
speed from x acceleration
spec 256 hann(64) 16
x acceleration
audio energy
10
26.5
27
27.5
28
28.5
29
0
33.2
5000
0
5000
10000
33.2
Figure 5.3: Audio - Gesture data correlation. From top to bottom: audio
signal energy, audio spectrogram, integrated speed absolute value, dynamic
acceleration. X-axis in seconds
29
5.3. FEATURES
4000
4000
3000
3000
2000
2000
1000
1000
1000
1000
2000
2000
3000
500
600
700
800
900
1000
1100
3000
500
4000
2000
2000
2000
2000
4000
4000
6000
6000
1100
1150
1200
1250
1300
1350
1400
1450
1500
700
800
900
1000
1100
4000
8000
1050
600
1550
8000
1050
1100
1150
1200
1250
1300
1350
1400
1450
1500
1550
Figure 5.4: Bow Stroke Segmentation. 5.4(a) and 5.4(b) represent the acceleration signal and the bow stroke segmentation for two detache bow strokes.
5.4(c) and 5.4(d) represent the acceleration signal and the bow stroke segmentation for two martele bow strokes. There is a strong deceleration peak
in martele which is part of the execution of the bow stroke and is not the
beginning of the following one.
30
threshold=20000
x 10
threshold=20000
x 10
4000
3000
2000
1000
1000
2000
3000
500
1000
1500
2000
0
500
1000
1500
10
2000
0
500
1000
4
threshold=50000
x 10
10
1500
2000
1500
2000
2000
2500
threshold=50000
x 10
4000
2000
2000
4000
6000
8000
500
1000
1500
2000
0
500
1000
4
3.5
1500
2000
0
500
1000
4
threshold=15000
x 10
3.5
threshold=15000
x 10
2000
1500
2.5
2.5
1000
500
1.5
1.5
500
1000
0.5
0.5
1500
2000
2500
1000
1500
2000
2500
0
1000
1500
4
2000
2500
0
1000
1500
4
threshold=10000
x 10
x 10
2000
1500
1000
2.5
2.5
1.5
1.5
0.5
0.5
500
500
1000
1500
1200
1400
1600
1800
2000
2200
2400
2600
2800
0
1200
1400
1600
1800
2000
2200
2400
2600
2800
0
1200
1400
1600
1800
2000
2200
2400
2600
2800
Figure 5.5: Segmentation Steps. Each line concerns a dierent bow stroke:
from top to bottom, detache, martele, pique and spiccato. From left to
right: raw dynamic acceleration, filtered and thresholded signal with a 64hann window and the segmentation markers.
31
5.3. FEATURES
Caractrisation de
l'attaque
vitesse
V1
a1
a2
V2
T
temps
32
Chapter 6
6.1
Measurement Protocol
The violin player was asked to perform some series of bow strokes for each
type detache, martele, pique and spiccato. Each type of bow strokes was
performed on dierent strings (one for each measure), with dierent lefthand fingering, nuance and tempo.
We simultaneously recorded audio by a cardiod KM-140 microphone
placed at about 50cm to the violin. The audio was digitized by a MOTO
828 sound card. Synchronization with gestural data was done by Max/MSP.
In order to align sound and data, we triggered a Heaviside function coupled
with a sinusoid wave and recorded the former as gestural data and the latter
as sound.
The data was exported to textfiles in order to study it in Matlab. The
segmentation algorithm described in chapter 5 was then used to constitute
the bow stroke database. We extracted the features described in chapter 5
from each of the database bow strokes and plotted them in the space
{v1 ; v2 |
amin
|; t}
amax
33
34
6.2
Results
The features are plotted in the figure 6.1. Each point represents a bow stroke
played in a certain way: blue color is for detache, red for martele, green for
pique and black for spiccato. The plotted data represents the feature values
for notes played moderato (60 bpm), mezzo forte, on each of the four strings.
The top plot on the left is a 3D view of the bow strokes feature points. This
min
3D space is generated by v1 , v2 | aamax
|, and t. The three other plots are
the projection of the data on each coordinate plane.
Do5 La3 Si5 Sol4
x=speed1, y=delta T
400
detache
martele
pique
spiccato
400
300
300
200
200
100
100
0
8
6
x 10
4
2
x 10
4
6
x 10
6
300
200
100
8
6
x 10
x 10
4
6
x 10
Figure 6.1: Features of bow strokes for dierent tones played moderato (60
bpm), mezzo forte, on the four strings. Top left plot is a 3D representation
of the clusters. The three other plots are projections on the coordinate
planes. Each point represents a bow stroke played in a certain way. Blue is
for detache, red for martele, green for pique and black for spiccato. There
are approximately 200 points per type of bow strokes.
We can identify four distinct clusters corresponding to the four types of
bow strokes we analyzed. The features show a first invariance property in
that they stay clustered whatever the string played. This property is not
obvious considering that playing on the G string at the same nuance as on
6.3
We now examine the feature space properties more deeply considering changes
in nuance and tempo.
6.3.1
Nuance variations
Figure 6.2 plots the feature points for a note played moderato (60 bpm) at
the nuances pianissimo (symbol +), mezzo forte (symbol .) and fortissimo
(symbol x).
The first result is that the points are still clustered, which reinforce our
invariance property. Now, if we observe the points more in details, we can
min
|,
see that a modification in nuance results in variations on v1 and on v2 | aamax
i.e. variations on speed amplitude rather than on the time interval between
the acceleration extrema |tmax tmin |. This can be explained by the attacks
being less marked in softer nuances than in louder nuances. We therefore
can determine variation directions directly related to nuance variations: a
bow stroke performed louder will result in its feature point having a higher
min
v1 and on v2 | aamax
|, and vice versa.
The feature space clusters show some interesting relevance. Indeed, we
can see that the points corresponding to pique fortissimo are close to martele
pianissimo, which seems pertinent from a violinist point of view. More
generally, fortissimo bow strokes tend to be more marked and therefore
become closer to martele, as for spiccato and detache.
36
x=speed1, y=delta T
350
detache
martele
pique
spiccato
400
300
300
250
200
200
150
100
100
50
4
2
0
10
6
x 10
x 10
4
6
x 10
6
10
300
x 10
250
200
150
100
2
50
0
10
6
x 10
4
6
x 10
6.3.2
Tempo variations
This time we focus on the feature space behavior with varying tempo. Figure
6.3 plots the feature points for a note played mezzo forte with the tempi
moderato (60 bpm, symbol .), and allegro (120bpm, symbol ).
The clusters are still coherent when varying tempo. However, the t
value is not constant whatever the tempo but decreases when tempo increases. The variations along this axis remains to examinate.
As with variations of nuances, we can see some pertinent variability in
the feature space, specially the getting closer of pique allegro and martele,
which again seems pertinent from a violinist point of view.
x=speed1, y=delta T
400
detache
martele
pique
spiccato
400
300
300
200
200
100
100
5
0
10
x 10
x 10
5
6
x 10
6
10
x 10
300
6
200
4
100
10
6
x 10
5
6
x 10
Figure 6.3: Tempo variations. Notes are played mezzo forte moderato (60
bpm, symbol .) and allegro (120bpm, symbol ).
6.3.3
Figure 6.4 plots the variations according to tempo and nuances. The feature points correspond to a note played pianissimo (symbol +), mezzo forte
(symbol .) and fortissimo (symbol x) at a moderato tempo, and mezzo forte
allegro (symbol ).
This figure illustrates the overlapping of bow stroke feature points. More
particularly, mezzo forte, allegro pique (green ) and pianissimo, moderato
martele (red +) points cluster in the same region. We do not know whether
this artifact is due to a problem with our features or if the gestures are
exactly the same. There are at least two ways of getting some clues about
it. We have seen the strong correlation between bow speed and audio. A
detailed spectrum analysis might help answering the question. The other
way concerns psycho-acoustic studies. We have indeed stressed the perceptual pertinence of the feature space. It would therefore be interesting to see
if subjects reproduce the same confusion.
38
x=speed1, y=delta T
350
detache
martele
pique
spiccato
400
300
250
200
300
150
200
100
100
0
10
6
x 10
x 10
5
50
0
5
6
x 10
6
10
300
x 10
250
200
150
100
2
50
0
10
6
x 10
5
6
x 10
Figure 6.4: Nuance and Tempo variations. Notes are played pianissimo
(symbol +), mezzo forte (symbol .) and fortissimo (symbol x) at a moderato
tempo, and mezzo forte allegro (symbol )
6.3.4
Player variations
We asked professional violinist Jeanne-Marie Conquer to perform the bowstrokes described in the measurement protocol. Figure 6.5 shows the feature
points from the analysis of her movements (points marked +). The points
marked (.) are from measures with my own movements. It is to be noted
that only three types of bow strokes have been performed by Jeanne-Marie
Conquer, for whom pique and spiccato are a same bow stroke.
The clusters remain coherent between the two players, which shows a
strong invariance property. However, we should keep in mind that the performed bow strokes may be stereotyped, because out of a musical context.
Further analysis should be carried out on musical excerpts, and with more
players of dierent skills, in order to study finer playing dierences.
x=speed1, y=delta T
500
600
400
400
300
200
200
100
10
0
15
6
10
x 10
x 10
8
6
x 10
6
14
x 10
12
400
10
300
200
6
4
100
0
2
0
10
15
6
x 10
8
6
x 10
Figure 6.5: Player variations. The feature points marked (+) are relative
to violonist Jeanne-Marie Conquer. The features marked (.) are mine. Blue
is derache, Red martele and Black spiccato. The clusters show a strong
invariance property.
6.3.5
40
Chapter 7
42
Bibliography
[1] Anders Askenfelt. Measurement of bow motion and bow force in violin
playing. J. Acoust. Soc. Am., 80(4), 1986.
[2] Anders Askenfelt. Measurement of the bowing parameters in violin
playing. ii: Bow-bridge distance, dynamic range, and limits of bow
force. J. Acoust. Soc. Am., 86(2), 1989.
[3] Emmanuel Flety, Nicolas Leroy, Jean-Christophe Ravarini, and Frederic
Bevilacqua. Versatile sensor acquisition system utilizing network technology. In Proceedings of the Conference on New Instruments for Musical Expression, NIME, 2004.
[4] Suguru Goto and Takahiko Suzuki. The case study of application of
advanced gesture interface and mapping interface, - virtual musical instrument le superpolm and gesture controller bodysuit. In Proceedings
of the Conference on New Instruments for Musical Expression, NIME,
2004.
[5] Camille Goudeseune. Composing with parameters for synthetic instruments. PhD thesis, University of Illinois Urbana-Champaign, 2001.
[6] K Guettler, E Schoonderwaldt, and A Askenfelt. Bow speed or bowing
position - which one influences spectrum the most? In Proceedings of
the Stockholm Music Acoustics Conference, SMAC, 2003.
[7] Camille Goudeseune Guy Garnett Timothy Johnson. An interface for
real-time classification of articulations produced by violin bowing. In
Proceedings of the Conference on Human Factors in Computing Systems, 2001.
[8] Emily Morin. Captation de modes de jeu instrumentaux : le cas du
violoncelle. Technical report, Ircam, 1999.
[9] Emily Morin. Analyse des coups darchet du violoncelliste avec le dispositif digibow. Technical report, Ircam, 2000.
43
44
BIBLIOGRAPHY
[10] Charles Nichols. The vbow: Development of a virtual violin bow haptic
human-computer interface. In Proceedings of the Conference on New
Instruments for Musical Expression, NIME, 2002.
[11] Chad Peiper, David Warden, and Guy Garnett. An interface for realtime classification of articulations produced by violin bowing. In Proceedings of the Conference on New Instruments for Musical Expression,
NIME, 2003.
[12] J.C. Schelleng. The bowed string and the player. J. Acoust. Soc. Am.,
53(1), 1973.
[13] Bernd Schoner, Chuck Cooper, Chris Douglas, and Neil Gershenfeld.
Data-driven modelling and synthesis of acoustical instruments. In Proceedings of the International Computer Music Conference, ICMC, 1998.
[14] Bernd Schoner, Chuck Cooper, Chris Douglas, and Neil Gershenfeld.
Cluster-weighted sampling for synthesis and cross-synthesis of violin
family instrument. In Proceedings of the International Computer Music
Conference, ICMC, 2000.
[15] E Schoonderwaldt, K Guettler, and A Askenfelt. Eect of the width
of the bow hair on the violin string spectrum. In Proceedings of the
Stockholm Music Acoustics Conference, SMAC, 2003.
[16] Stefania Serafin and Diana Young. Bowed string physical model validation through use of a bow controller and examination of bow strokes.
In Proceedings of the Stockholm Music Acoustics Conference, SMAC,
2003.
[17] Stefania Serafin and Diana Young. Toward a generalized friction controller: from the bowed string to unusual musical instruments. In Proceedings of the Conference on New Instruments for Musical Expression,
NIME, 2004.
[18] Atau Tanaka. Trends in Gestural Control of Music, chapter Musical
Performance Practice on Sensor-based Instruments, page 389 to 406.
Ircam, 2000.
[19] Dan Trueman and Perry R. Cook. Bossa: The deconstructed violin
reconstructed. In Proceedings of the International Computer Music
Conference, ICMC, 1999.
[20] Marcelo Wanderley. Performer-Instrument interaction : applications
to gestural control of sound synthesis. Th`ese, Universite Paris 6, 2001.
[21] M.M. Wanderley and M. Battier, editors. Trends in Gestural Control
of Music. Ircam, 2000.
BIBLIOGRAPHY
45