Score-Informed Analysis of Intonation and Pitch Modulation in Jazz Solos
Score-Informed Analysis of Intonation and Pitch Modulation in Jazz Solos
Score-Informed Analysis of Intonation and Pitch Modulation in Jazz Solos
823
824 Proceedings of the 16th ISMIR Conference, Malaga, Spain, October 26-30, 2015
as well, which is an essential pre-processing step for ana- all musicians in the dataset with their instrument, the num-
lyzing the applied frequency modulation techniques. There ber of solos NS , and the total number of tones and f0 con-
are many studies on vibrato detection in audio recordings tours NN , respectively. The solos were manually anno-
[14], particularly for singing voice [8,9,12]. Other publica- tated by musicology and jazz students based on excerpts
tions deal with analyzing the deviation of f0 contours from from commercial audio recordings. The annotations in-
the target pitch [8] as well as with segmenting f0 contours clude score-level melody transcription (MIDI pitch, tone
based on modulations such as vibrato and pitch glides [12] onset, and duration) as well as additional annotation layers
or bendings [10]. To the best knowledge of the authors, with respect to melody phrases, metric structure, chords,
no publication so far analyzes intonation and modulation and modulation techniques. So far, the tone-wise anno-
techniques in recorded jazz solos. tations of modulation techniques are incomplete and only
represent the most clear examples within the solos. Figure
4. METHOD 2 gives an overview over the number of annotated tones per
artist. In total, 87643 tones and f0 contours are included in
Figure 1 gives an overview over our analysis approach, the dataset.
all processing steps are detailed in the following sections.
Section 4.1 describes the dataset of jazz solo audio excerpts Woody Shaw
Wayne Shorter Falloff
Steve Lacy Slide
and transcriptions. Two separate score-informed analysis Steve Coleman
Stan Getz
Sonny Stitt
Vibrato
tion 5, we analyze how these features depend on contextual Figure 2: Number of tones of each artist which are anno-
parameters such as tone duration and pitch and whether tated with fall-off, slide, and vibrato.
these might be specific for the personal style.
Table 1: Overview over all artists in the dataset. For each artist, the number of solos NS , the total number of notes NN , as
well as the instrument is given (ts: tenor saxophone, ss: soprano saxophone, as: alto saxophone, cl: clarinet, tp: trumpet,
cor: cornet, tb: trombone, ts-c: C melody tenor saxophone).
is computed. We use a logarithmic frequency axis with a ing frequency hypotheses, a final tuning frequency esti-
high resolution of 50 bins/semitone and a frequency range mate fref is derived. We modified the originally proposed
of 2 semitones around the annotated pitch. Based on an search range for fref to 440 Hz0.5 semitone (correspond-
initial short-time Fourier transform (STFT) with a block- ing MIDI pitch range: 69 0.5) and the stepsize to 0.1
size of 1024, a hopsize of 64, and a zero-padding factor of cents. As will be shown in Section 5.1, the influence of
16, the magnitude values are mapped (reassigned) towards source separation artifacts on the estimation accuracy of
the frequency bins that correspond to their instantaneous the reference tuning frequency can be neglected.
frequency values at the original frequency bins computed
using the method proposed by Abe in [1]. Two steps are 4.5 Feature Extraction
performed for each tone to estimate its f0 contour. First,
Based on the estimated contour f0 (n) of each tone, we first
we estimate a suitable starting frame within the tones du-
perform a smoothing using a two-element moving aver-
ration with a prominent peak close to the annotated pitch.
age filter in order to compensate for local irregularities and
Second, we perform a contour tracking both forwards and
possible estimation errors. The extracted audio features
backwards in time. Further details are provided in [3].
describe the gradient of the f0 contour as well as its tem-
poral modulation. Table 2 lists all computed audio features
4.4 Tuning Frequency Estimation and their dimensionality.
The oldest recordings in our dataset date back to the year
1924, two years before the American music industry rec- Category Feature Label Dim.
ommended 440 Hz for A4 as standard tuning, and 12 years Gradient Linear slope 1
Gradient Median gradients (first half, second half, over- 3
before the American Standards Association officially all)
adopted it. Hence, we can not rely on the assumption of Gradient Ratio of ascending frames 1
Gradient Ratio of ascending / descending / constant 3
a constant and fixed overall tuning. Moreover, the techni- segments
cal level of recording studios were rather low at this time, Gradient Median gradient of longest segments 1
Gradient Relative duration of longest segments 1
which might result in tuning deviations by speed fluctu- Gradient Pitch progression 1
ations of recording machines as well as from instruments Modulation Modulation frequency [Hz] 1
Modulation Modulation dominance 1
tuned to another reference pitch such as studio or live venue Modulation Modulation range [cent] 1
Modulation Number of modulation periods 1
pianos. Hence, we estimate a reference tuning frequency Modulation Average relative / absolute f0 deviation 2
fref prior to the intonation analysis of the solo instrument Modulation f0 deviation inter-quartile-range 1
from the backing track of the rhythm section, which we ob-
tain from the source separation process explained in Sec-
tion 4.2. The reference tuning frequency corresponds to Table 2: Summary of audio features to descript the f0 con-
the fundamental frequency of the pitch A4 in the backing tours.
track.
In the Chroma Toolbox [13], a triangular filterbank is
4.5.1 Gradient features
generated based on a given tuning frequency in such way
that its center frequencies are aligned to the chromatic scale Based on the gradient f0 (n) = f0 (n+1) f0 (n), we first
within the full piano pitch range. For a given audio signal, determine frames and segments of adjacent frames with as-
the magnitude spectrogram is averaged over the full signal cending ( f0 (n) > 0), descending ( f0 (n) < 0), and
duration and processed using the filterbank. By constant frequency. We use the relative duration (with re-
maximizing the filterbank output energy over different tun- spect to the note duration) of each segment class as fea-
826 Proceedings of the 16th ISMIR Conference, Malaga, Spain, October 26-30, 2015
tures. Also, we compute median gradients in the first and as on the backing track obtained from the source separation
second halves, over the whole note, as well as over the of the solo part (compare Section 4.2) to get two estimates
Backing
longest segment. Overall pitch progression is measured by NoSolo
fref and fref of the reference tuning frequency.
the difference of average f0 values in the end and begin- The results show a very high sample correlation of r =
ning of each tone. Furthermore, we use linear regression 0.97 (p < 0.001) and a small root mean squared error of
to estimate the linear slope of the f0 contour. RMSE = 1.05 Hz between both estimates. These results
indicate that the influence of source separation artifacts is
4.5.2 Modulation features
negligible for the tuning estimation process. Therefore, we
We analyze the modulation of the f0 contour by comput- will use fref = fref
Backing
as an estimate of the reference
ing the autocorrelation over f0 (n). Fletcher [7] reported tuning frequency throughout the paper.
for woodwind instruments that a vibrato frequency range
between 5 and 8 Hz is comfortable for listeners and com- 5.2 Relationship between the Reference Tuning and
mon for players. We add a safety margin of 2 Hz and the Recording Year / Decade
search for the lag position max of the highest local maxi-
mum within the lag range that corresponds to fundamental How did the tuning frequency fref of commercial jazz
frequency values of fmod 2 [3, 10] Hz and estimate the recordings change during the 20th century? Figure 3 shows
modulation frequency as fmod = 1/max . The difference the distribution of solos in the dataset over the from the
between the maximum and median magnitude within this 1920s to the 2000s. Moreover, the inserted boxplots illus-
frequency band is used as dominance measure for the mod- trate the deviation f = 1200 log2 f440ref
between the tuning
ulation. Other applied features are the number of modula- frequency fref and 440 Hz in cent.
tion periods and the frequency modulation range in cent. Absolute tuning deviation | f | and recording year of
each solo are weakly negatively correlated (r = 0.33,
4.6 Analysis of Intonation and Modulation p < 0.001). Hence, the absolute deviation from the tun-
Techniques ing frequency from 440 Hz decreased over the the course
of the 20th century, reflecting the spread of the 440 Hz
We distinguish three modulation techniques fall-off, slide,
standard (1955 adopted by the International Standards Or-
and vibrato. Table 3 provides a description of the charac-
ganization), as well as the progress of studio technology.
teristic f0 contour shape for each technique. The number
of tones in our dataset annotated with each technique is
given in Table 3. 80
60
Technique Description Notes
f [cent] | Number of solos
40
Fall-off Drop of the f0 contour in the end of the tone after a 146
stationary part. 20
Slide Rise or drop of the f0 in the beginning of the tone 708
towards a stationary part. 0
Vibrato Periodic modulation of the f0 contour during the sta- 1380
tionary part of the tone. 20
None No discernible modulation of the f0 contour / No 83587
modulation technique annotated. 40
60
19201930
19301940
19401950
19501960
19601970
19701980
19801990
19902000
few exceptions: Sidney Bechet, a traditional soprano saxo- 5.4 Context-dependency of the Modulation Frequency
phonist, has very high values; however, presumably this is of Vibrato
caused not by a sharp intonation but by the high percentage
Does the modulation frequency of vibrato depend on pitch,
of pitch slides played by him (almost 15 % of the tones, cf.
or duration of the vibrato tones, or on the tempo of the
Figure 2).
piece? For the 1380 tones with vibrato notes (cf. Table
For most players, the range of frequency modulation, 3), we found no significant correlations between modula-
i.e., the size of vibrato, is around 25 cent. There are some tion frequency and pitch (r = 0.02, p = 0.42), duration
bigger modulation ranges from 35 to 50 cent, predomi- (r = 0.02, p = 0.5), nor tempo (r = 0.0, p = 0.83).
nantly used by tenor saxophone players associated with The small effect size of the correlation indicates that de-
swing style (Ben Webster, Coleman Hawkins, Don Byas, spite the high variety of tempo values in the dataset (mean
and Lester Young), but also by postbop tenor saxophonist tempo 154.52 bpm, standard deviation 68.16 bpm), the
Joe Lovano, and, again, by Sidney Bechet, showing the modulation frequency only slightly increases with increas-
largest variance of modulation ranges. Therefore, there are ing tempo.
some slight personal and stylistic peculiarities in the use Furthermore, we investigated, whether and how the mo-
of vibrato size. However, there are no obvious trends of dulation frequency of vibrato is connected to the under-
intonation according to different instruments (cf. Figure lying metrical structure of a solo. We computed the ra-
5), since for each instrument there seem to be players who tio r = Tmod /Tsolo between the modulation tempo and
play a bit sharp as well as players who play a bit low; note the average tempo of the solo. The modulation tempo is
that for trombone and c-melody sax there is only one mu- computed as Tmod = 60fmod . Figure 6 shows the ratio
sician (J.J. Johnson resp. Joe Lovano) in our sample. Like- r against the average tempo of the solo. There is no ev-
wise, there is no evidence for general trends of modulation idence in our data for a strategy to adapt the modulation
ranges with respect to instrument. frequency of vibrato to integer multiples of the tempo of
the piece, e.g., to use a vibrato speed according to simple
subdivision of the beat (e.g. eighth notes or eighth triplets).
100
75
As Figure 6 shows, for medium and fast tempos (100 to
AvF0Dev [cent]
50
WooSha
WaySho
BenGoo
CanAdd
CheBak
DonBya
PauDes
ColHaw
DavMur
LouArm
DexGor
JoeHen
LeeKon
BenCar
ChaPar
JosRed
KenDor
LesYou
BobBer
FreHub
SonRol
FatNav
JoeLov
RoyEld
OrnCol
SidBec
DavLie
JohCol
MicBre
StaGet
SteLac
ArtPep
MilDav
SteCol
SonSti
J.JJoh
BixBei
CliBro
EriDol
DizGil
12
100 10
8
Tmod / Tsolo
75
ModRange [cent]
6
50
4
25
2
0
0
50 100 150 200 250 300 350
BenWeb
WooSha
WaySho
BenGoo
CanAdd
CheBak
DonBya
PauDes
ColHaw
DavMur
LouArm
DexGor
JoeHen
LeeKon
BenCar
ChaPar
JosRed
KenDor
LesYou
BobBer
FreHub
SonRol
Tsolo
FatNav
JoeLov
RoyEld
OrnCol
SidBec
DavLie
JohCol
MicBre
StaGet
SteLac
ArtPep
MilDav
SteCol
SonSti
J.JJoh
BixBei
CliBro
EriDol
DizGil
75
AvF0Dev [cent]
tsc
cor
cor
as
ss
ss
tb
tp
tb
tp
ts
ts
cl
cl
CheBak
DonBya
PauDes
DavMur
LouArm
LeeKon
BenCar
ChaPar
KenDor
FreHub
RoyEld
SidBec
StaGet
ArtPep
SonSti
BixBei
this study we demonstrated exemplarily that our method
can be readily applied for a range of different research
Figure 7: Modulation frequency in Hz in vibrato notes for questions, from historical analysis of reference tuning in
different performers. Only performers with more than 20 20th century jazz recordings to more general questions such
vibrato notes are shown. as intonation accuracy or differences in f0 modulations
with respect to tempo, instrument class, stylistic trends, or
personal style.
playing standards for brass instruments in classical music, As a case study, we investigated whether some these ex-
where it is custom to play without any vibrato [7]. pressive aspects, i.e., intonation, slides, vibrato speed and
vibrato range, are correlated with structural features of the
5.6 Automatic Classification of Frequency solos (absolute pitch, tone duration, overall tempo, meter)
Modulation Techniques and whether those aspects are characteristic for an instru-
Using the set of features discussed in Section 4.5, we ex- ment, a jazz style or the personal style of a musician. While
tracted an 18-dimensional feature vector for each tone, there is little evidence for a general correlation between in-
which was used to automatically classify tones with re- tonation and pitch modulation (slide, vibrato) on the one
spect to their modulation class. To this end, we only con- hand, and structural features on the other hand, the issue
sidered tones annotated with fall-off, slide, and vibrato of how intonation and pitch modulation contributes to the
since all remaining tones were not explicitly annotated. We formation of a jazz style and personal style needs further
used a Support Vector Machine (SVM) classifier with a lin- examination with more data and including listening tests
ear kernel function as classification algorithm and perform for style discrimination.
a 10-fold cross-validation. Due to the imbalanced class For the future, we plan to complete and refine the f0 -
sizes (cf. Table 3), we repeatedly re-sampled from the ex- modulation annotations for the dataset, with the overall
isting class items such that all classes have the same num- goal of the design of an automated f0 -modulation anno-
ber of items as the largest class from the original dataset. tation algorithm. Finally, we aim at a complete description
The confusion matrix is shown in Table 4. The highest of personal timbre characteristics, the so-called sound of
accuracy of 92.25 % was achieved for vibrato tones. The a player, which is an important dimension of jazz music,
classes fall-off and slide show lower accuracy values of and not yet fully addressed. Dynamics [2], intonation, ar-
48.04 % and 67.32 %, respectively. One might assume, ticulation, and f0 -modulation are part of this sound, but
that the similar f0 contour shapes of fall-offs and the slide- other aspects such as breathiness, roughness and general
downs causes part of the confusions between both classes. spectral characteristics (and their classification) are still to
be explored.
Correct Classified
Fall-off Slide Vibrato 7. ACKNOWLEDGEMENTS
Fall-off 48.04 37.46 14.49
Slide 23.55 67.32 9.13 The Jazzomat research project is supported by a grant DFG-
Vibrato 4.06 3.7 92.25
PF 669/7-1 (Melodisch-rhythmische Gestaltung von Jazz-
improvisationen. Rechnerbasierte Musikanalyse einstim-
Table 4: Confusion matrix for the automatic classification miger Jazzsoli) by the Deutsche Forschungsgemeinschaft
of frequency modulation techniques. All values are given (DFG). The authors would like to thank all jazz and mu-
in percent. sicology students participating in the transcription and an-
notation process.
6. CONCLUSIONS
In this exploratory study, we proposed a
score-informed algorithm for the extraction of
non-syntactical features in jazz solos played with wind and
brass instruments. This method allows for an analysis of
performative and expressive aspects of jazz improvisation
which are not captured by the traditional approaches of
jazz research such as transcriptions (even though some rudi-
mentary notation for f0 -modulations are used sometimes).
Proceedings of the 16th ISMIR Conference, Malaga, Spain, October 26-30, 2015 829