s10844-010-0140-5
s10844-010-0140-5
s10844-010-0140-5
DOI 10.1007/s10844-010-0140-5
Abstract In this paper, the influence of the selected sound features on distinguishing
between musical instruments is presented. The features were chosen basing on
our previous research. Coherent groups of features were created on the basis of
significant features, according to the parameterization method applied, in order
to constitute small, homogenous groups. In this research, we investigate (for each
feature group separately) if there exist significant differences between means of these
features for the studied instruments. We apply analysis of variance along with post
hoc comparisons in the form of homogeneous groups, defined by mean values of
the investigated features for our instruments. If a statistically significant difference
is found, then the homogenous group is established. Such a group may consist of
only one instrument (distinguished by this feature), or more (instruments similar with
respect to this feature). The results show which instruments can be best discerned by
which features.
1 Introduction
A. Wieczorkowska (B)
Polish-Japanese Institute of Information Technology,
Koszykowa 86, 02-008 Warsaw, Poland
e-mail: [email protected]
A. Kubik-Komar
University of Life Sciences in Lublin,
Akademicka 13, 20-950 Lublin, Poland
e-mail: [email protected]
294 J Intell Inf Syst (2011) 37:293–314
for users are challenging from the point of view of content-based retrieval. The users
can be interested in finding melodies sung to the microphone (queries by humming),
identify the title and the performer of a piece of music submitted as an audio input
containing a short excerpt from the piece (query by example), or finding pieces
played by their favorite instruments. Browsing audio files by the users is a tedious
task. Therefore, any automation comes very handy. If the audio data are labeled,
searching is easy, but usually the text information added to the audio file is limited
to the title, performer etc. In order to perform automatic content annotation, sound
analysis is usually performed, sound features extracted, and then the contents can
be classified into various categories, in order to fulfill the user’s query and find the
contents specified.
The research presented in this paper is an extended version of an article presented
at the ISMIS’09 conference (Wieczorkowska and Kubik-Komar 2009a), addressing
the problem of instrument identification in sound mixes. The identification of
instruments playing together can aid automatic music transcription with assigning
recognized pitches to instrument voices (Klapuri 2004). Also, finding pieces of music
with excerpts played by a specified instrument can be desirable for many users of
audio repositories. Therefore, investigating the problem of automated identification
of instruments in audio recordings is vital for music information retrieval tasks.
In our earlier research (Wieczorkowska et al. 2008), we performed automatic
recognition of predominant instrument in sound mixes using SVM (Support Vector
Machines). The feature vector applied was used before in research on automatic
classification of instruments (NSF 2010; Zhang 2007), and it contains sound at-
tributes commonly used for timbre identification purposes. Most of the attributes
describe low-level sound properties, based on MPEG-7 audio descriptors (ISO/IEC
JTC1/SC29/WG11 2004), and since many of them are multi-dimensional, derivative
features were used instead (minimal/maximal value etc.). Still, the feature vector
is quite long, and it contains groups of attributes that can constitute a descriptive
feature set themselves. In this research, we decided to compare descriptive power of
these groups. In-depth statistical analysis of the investigated sets of features for the
selected instruments is presented in this paper.
Our paper is organized as follows. In Section 2 we briefly familiarize the reader
with task and problems related to the automatic identification of musical instrument
sounds in audio recordings. The feature groups used in our research are also
presented here. In the next section, we describe settings and methodology of our
research, as well as the audio data used to produce the feature vectors. In Section 4
we describe in depth the results of the performed analyzes. The last section concludes
our paper.
Since audio data basically represent sequences of samples encoding the shape of the
sound wave, these data usually are processed in order to extract feature vectors,
and then automatic classification of audio data is performed. Sound features used
for musical instrument identification purposes include time domain descriptors of
sound, spectral descriptors, time-frequency descriptors and can be based on Fourier
or wavelet analysis etc. Feature sets applied in research on instrument recognition
J Intell Inf Syst (2011) 37:293–314 295
Fig. 1 Sounds of the same pitch and their mixes. On the left hand side, time domain representation
of sound waves is shown; on the right hand side, spectrum of these sounds is plotted. Triangular
wave and flute sound are shown, both of frequency 440 Hz (i.e., A4 in MIDI notation). After
mixing, spectral components (harmonic partials) overlap. The diagrams were prepared using Adobe
Audition (Adobe 2003)
296 J Intell Inf Syst (2011) 37:293–314
same pitch, as this is the most difficult case (harmonic partials in spectra overlap).
Exemplary mix of two sound waves of the same pitch is shown in Fig. 1. As we can
see, the flute sound is much more difficult to recognize after adding another sound
with overlapping spectrum.
When investigating the significance for the set of 219 sound parameters (including
the ones mentioned above), used in our previous research, the attributes representing
the above groups were often pointed out as significant, i.e. of high discriminant
power (Wieczorkowska and Kubik-Komar 2009b). Therefore, it seems promising to
perform investigations for the groups mentioned above.
Since AudioSpectrumBasis group presents a high-dimensional vector itself,
and the first subgroup basis1 , . . . , basis5 turned out to have high discriminant
power, we decided to limit the AudioSpectrumBasis group to basis1 , . . . , basis5 .
In AudioSpectrumFlatness group, f lat10 , . . . , , f lat25 had high discriminant power,
whereas f lat1 , . . . , f lat9 had not, as we have observed in our previous research
(Wieczorkowska and Kubik-Komar 2009b). Also, we decided to investigate energy as
a single conditional attribute, and the Tris group, as well as MFCC group. One could
discuss whether such parameters as minimum or maximum of MFCC (MFCCmin ,
MFCCmax ) are meaningful, but since these parameters yielded high discriminant
power, we decided to investigate them.
Altogether, the following groups (feature sets) were investigated in this paper:
3 Research settings
In order to check if the particular groups of sound features can discriminate in-
struments, we performed multivariate analysis of variance (MANOVA). Next, we
analyzed the parameters from this group using univariate method (ANOVA). In case
of rejecting the null hypothesis about the equality of means between instruments
for a given feature group, we used post hoc comparisons to find out how we can
discriminate particular instruments on the basis of parameters included in this feature
group, i.e. which sound attributes from this group are best suited to recognize a given
instrument (discriminate it from other instruments).
Our data represented sounds of 14 instruments from MUMS CDs (Opolko and
Wapnick 1987): B-flat clarinet, flute, oboe, English horn, trumpet, French horn, tenor
trombone, violin (bowed vibrato), viola (bowed vibrato), cello (bowed vibrato),
piano, marimba, vibraphone, and tubular bells. Twelve sounds, representing octave
298 J Intell Inf Syst (2011) 37:293–314
no. 4 (in MIDI notation) were used for each instrument, as a target sound to be
identified in classification. Additional sounds were mixed with the main sounds,
both for training and testing of the classifiers in further experiments with automatic
classification of musical instruments
√ (Kursa et√al. 2009). The level
√ of added sounds
was adjusted to 6.25%, 12.5/ 2%, 12.5%, 25/ 2%, 25%, 50/ 2%, and 50% of the
level of the main sound, since our goal was to identify the predominant instrument.
For each main instrumental sound, additional four mixes with artificial sounds were
prepared for each level: with white noise, pink noise, with triangular wave and with
saw-tooth wave (i.e., both of harmonic spectrum) of the same pitch as the main
sound. This set was prepared to be used as a training set for classifiers, subsequently
tested on musical instrument sound mixes, using same pitch sounds. Each sound to
be identified was mixed with 13 sounds of the same pitch representing the remaining
13 instruments from this data set. Again, the sounds added in mixes were adjusted in
level, at the same levels as in training. Results of these experiments can be found in
Kursa et al. (2009).
In this research, we investigated data representing musical instrument sounds, as
well as mixes with artificial sounds. All these data were parameterized using feature
sets as described in Section 3.
1 − 1/s ms + 1 − dh p/2
F= ·
1/s dh p
where
p2 d2h − 4
s=
p2 + d2h − 5
m = de − ( p + 1 − dh )/2
p number of variables,
dh number of degrees of freedom for the hypothesis,
de number of degrees of freedom for the error.
The results of MANOVA show that the vector of mean values of the analyzed
AudioSpectrumBasis features significantly differed between the instruments (F =
188.0, p < 0.01). On the basis of the univariate results (ANOVA) we conclude
that the means for instruments differed significantly for each parameter separately.
The statistics F having a Fisher distribution with parameters 13 and 5,866 (i.e.
mean mean
Instrument 1 2 3 4 5 6 7 8 9 Instrument 1 2 3 4 5 6 7 8 9 10
basis4 basis5
F(13, 5,866)), was equal to 113.8, 115.0, 12.9, 371.78, and 405.7 for basis1 , . . . , , basis5 ,
respectively, and each of these values produced p-value less than 0.01. These results
allowed us to apply post-hoc comparisons for each of the AudioSpectrumBasis para-
meters, presented in form of tables consisting of homogenous groups of instruments
(Fig. 2).
The results of post hoc analysis revealed that basis4 , basis5 and basis1 distinguish
instruments to a large extent. The influence of basis2 and basis3 on differentiation
between instruments is rather small. Marimba, piano, as well as vibraphone, and
the pair of tubular bells and French horn, often determine separate groups. Piano,
0,13 0,225
0,12 0,220
0,215
0,11
0,210
basis1
basis2
0,10 0,205
0,09 0,200
0,195
0,08
0,190
0,07 0,185
Clarinet
Cello
Trumpet
EnHorn
Flute
FrHorn
Marimba
Oboe
Piano
Trombone
Bells
Vphone
Viola
Violin
Clarinet
Cello
Trumpet
EnHorn
Flute
FrHorn
Marimba
Oboe
Piano
Trombone
Bells
Vphone
Viola
Violin
0,176 26
0,174 24
0,172 22
0,170 20
basis3
basis4
0,168 18
0,166 16
0,164 14
0,162 12
0,160 10
0,158 8
Clarinet
Cello
Trumpet
EnHorn
Flute
FrHorn
Marimba
Oboe
Piano
Trombone
Bells
Vphone
Viola
Violin
Clarinet
Cello
Trumpet
EnHorn
Flute
FrHorn
Marimba
Oboe
Piano
Trombone
Bells
Vphone
Viola
Violin
0,045
0,040
0,035
basis5
0,030
0,025
0,020
0,015
Clarinet
Cello
Trumpet
EnHorn
Flute
FrHorn
Marimba
Oboe
Piano
Trombone
Bells
Vphone
Viola
Violin
vibraphone, marimba, cello and trombone are very well separated by basis5 , since
each of these instruments constitute a 1-element group. Piano, vibraphone, marimba,
and cello are separated by basis4 , too. Also, basis1 separates marimba and piano. The
basis3 parameter only discerns marimba from other instruments (only two groups are
produced); basis2 does not separate any single instrument.
Whilst looking at the means of basis parameters (Fig. 3) we can indicate the
parameters producing similar plots of means—these are basis5 , basis4 , and, to a
lesser degree, basis2 . Similar values in plots of means indicate that the parameters
-2 22
-4 20
18
-6
16
mfccmax
mfccmin
-8 14
-10 12
-12 10
8
-14
6
-16 4
-18 2
Clarinet
Cello
Trumpet
EnHorn
Flute
FrHorn
Marimba
Oboe
Piano
Trombone
Bells
Vphone
Viola
Violin
Clarinet
Cello
Trumpet
EnHorn
Flute
FrHorn
Marimba
Oboe
Piano
Trombone
Bells
Vphone
Viola
Violin
0,0 750
700
-0,5 650
600
550
-1,0
mfccmean
500
mfccdis
450
-1,5 400
350
-2,0 300
250
-2,5 200
150
-3,0 100
Clarinet
Cello
Trumpet
EnHorn
Flute
FrHorn
Marimba
Oboe
Piano
Trombone
Bells
Vphone
Viola
Violin
Clarinet
Cello
Trumpet
EnHorn
Flute
FrHorn
Marimba
Oboe
Piano
Trombone
Bells
Vphone
Viola
Violin
9
8
7
6
mfccstd
5
4
3
2
1
Clarinet
Cello
Trumpet
EnHorn
Flute
FrHorn
Marimba
Oboe
Piano
Trombone
Bells
Vphone
Viola
Violin
producing these plots represent similar discriminative properties for these data.
When mean values of some parameters are similar for several classes, i.e. instruments
(which means that they are almost aligned in the plot, and distances between these
values are short), then these instruments may be collocated in the same group, i.e.
homogenous group with respect to these parameters. When a mean value for a
particular instrument is distant from the other mean values, then this instrument is
very easy to distinguish from the others with respect to this feature. For example,
marimba is well distinguished from the other instruments with respect to basis3 (the
mean value for marimba is distant from the others), whereas mean values of basis4
and basis5 for English horn and trumpet are similar, and these instrument are situated
in the same group, so these instruments are difficult to be differentiated basing on
these features.
We can also see that despite the lowest number of groups produced by basis3 the
difference of means between marimba and other instruments is very high, so the
influence of this parameter for such distinction (between marimba and others) is
quite important.
The AudioSpectrumBasis group represents features extracted through SVD, so
good differentiation of instruments was expected because SVD should yield the most
salient features of the spectrum. On the other hand, these features might not be
sufficient for discrimination of particular instruments. It is a satisfactory result that
we can differentiate several instruments on the basis of three features (basis1 , basis4 ,
and basis5 ). Especially, distinguishing between marimba, vibraphone and piano is a
very good result, since these instruments pose difficulties in automatic instrument
classification task. Their sounds (particularly marimba and vibraphone) are similar
and have no sustained part, thus have no steady state, so calculation of spectrum is
more difficult—but, as we can see, useful for instrument discrimination purposes.
The results of MANOVA indicate that the mean values of MFCC features differ
significantly between the studied instruments (F = 262.84, p < 0.01).
The univariate results show that means of each parameter (Fig. 4) from this group
were significantly different at the significance level p equal to 0.01 and F(13, 5,866)
values equal to 218.67, 329.92, 479.27, 698.8, and 550.9 for MFCCmin , MFCCmax ,
MFCCmean , MFCCdis and MFCCstd respectively.
The analysis of homogeneous groups see (Fig. 5) shows that MFCCsd and
MFCCmax yielded the highest difference of means, while MFCCmin —the lowest one.
Each feature defined six to nine groups, homogenous with respect to the mean value
of a given feature.
The piano determined the separate group for every parameter from our MFCC
feature set. The conclusion is that this instrument is very well distinguished by
MFCC. However, there were no parameters here that would be capable to distin-
guish between marimba and flute. These two instruments were always situated in the
same group, since the average values of studied parameters for these instruments
do not differ too much. Vibraphone and bells were in different groups only for
MFCCmean .
Piano, cello, viola, violin, bells, English horn, oboe, French horn, and trombone
constitute separate groups, so these instruments can be easily recognized on the basis
304 J Intell Inf Syst (2011) 37:293–314
Instrument mfccdis 1 2 3 4 5 6 7 8
piano 173.8251 x
frenchhorn 433.2952 x
tubularbells 461.3751 x
vibraphone 471.0583 x
ctrumpet 479.7590 x x
tenorTrombone 497.5668 x
flute 532.8474 x
bflatclarinet 553.1953 x
marimba 555.1694 x
cello 600.2499 x
englishhorn 623.1473 x
violin 647.3444 x
oboe 648.5515 x
viola 659.6647 x
of MFCC. On the other hand, some groups overlap, i.e. the same instrument may
belong to two groups.
The shape of plots of mean values of MFCCstd , MFCCdis and, to a lower extend,
MFCCmax (Fig. 4) is very similar, however the homogeneous groups, apart from
piano, are different. As we mentioned before, piano is very well distinguished on the
basis of all parameters from the MFCC group, since it always constitutes a separate,
1-element group. This is because the means for piano and other instruments in most
cases are extremely distant.
MFCC parameters described here represent general properties of the MFCC
vector. We consider it a satisfactory result that such a compact representation
turned out to be sufficient to discern between many instruments, mainly stringed
instruments of sustained sounds, i.e. cello, viola, and violin, and wind instruments,
J Intell Inf Syst (2011) 37:293–314 305
energy
-5
-6
-7
-8
-9
-10
Clarinet
Cello
Trumpet
EnHorn
Flute
FrHorn
Marimba
Oboe
Piano
Trombone
Bells
Vphone
Viola
Violin
both woodwinds (of very similar timbre, i.e. oboe and English horn, which can be
considered as a type of oboe) and brass (French horn and trombone). Even non-
sustained sounds can be distinguished, i.e. tubular bells and piano, separated by every
feature from our MFCC group.
Our presumption is confirmed by post-hoc results (Fig. 7). This parameter formed
8 homogeneous groups and 4 of them were determined by separate instruments such
as piano, violin, vibraphone and trumpet.
Energy turned out to be quite discriminative as a single parameter. We are aware
that if more input data are added (more recordings), our outcomes may need re-
adjustment; still, discriminating four instruments here (piano, violin, vibraphone and
trumpet) on the basis of one feature confirms high discriminative power of this
attribute.
The results of MANOVA show that mean values of tris parameters were significantly
different for the studied set of instruments (F = 210.9, p < 0.01)
The univariate results of F(13, 5,866) test are as follows: tris1 − 352.08 tris2 −
40.35, tris3 − 402.114, tris4 − 280.86, tris5 − 84.14, tris6 − 19.431, tris7 − 12.645,
tris8 − 436.89, tris9 − 543.39 and all these values indicated significant differences
between means of studied instruments at the significance level equal to 0.01. For
the tris feature set consisting of tris1 , . . . , tris9 parameters, each parameter defined
from three to nine groups, as presented in Fig. 8.
As we can see, tris3 , tris8 , and tris9 produced the highest number of homogeneous
groups. Some wind instruments (trumpet, trombone, English horn, oboe), or their
pairs, were distinguished most easily—determined separate groups for the features
forming eight to nine homogeneous groups.
Taking into considerations the plots of mean values (Fig. 9) we can add some more
information. Namely, the tris3 parameter, in spite of constituting the lowest number
of homogeneous groups, distinguishes piano very well. In most cases, the means of
vibraphone and marimba, sometimes also piano, are similar, and when they are high,
then at the same time the means for oboe and trumpet are low, and vice versa.
In case of the Tris group, we were expecting good results, since these features
were especially designed for the purpose of musical instrument identification. For in-
stance, clarinet shows low contents of even harmonic partials in its spectrum for lower
sounds (tris8 ). However, as we can see, other instruments—piano, vibraphone—
also show low value of contents of even partials, and marimba even lower than
these instruments. However, clarinet shows very high tris9 , i.e. amount of odd
harmonic partials (excluding the fundamental, marked as no. 1) in the spectrum,
which corresponds somehow to small amount of even partials—and this feature
discriminates clarinet very well. The results for the Tris parameters, presented in
Fig. 8, show that this set of features is quite well designed and can be applied as a
helpful tool for musical instrument identification purposes.
f lat18 − 106.26, f lat19 − 90.84, f lat20 − 62.56, f lat21 − 54.47, f lat22 − 62.54, f lat23 −
60.28, f lat24 − 70.61, f lat25 − 58.4. The plots of means for these parameters are
presented in Fig. 10. As we can see, the higher number of the element of this feature
vector (the higher the frequency), the higher mean values of flatness parameters are
obtained.
For the first four plots we can notice that most of values are at the similar
level except marimba and vibraphone, which means are high compare to the others
instruments. Than values for other instruments are getting higher and higher except
clarinet, viola, oboe, English horn, cello or trumpet, which means are changing
to a lesser degree. We can observe these changes also as the results of post hoc
comparisons. They show the high discriminating power of f lat10 , . . . , f lat14 , distin-
guishing marimba, vibraphone, French horn (these instruments constitute separate,
308
1,0
1,5
2,0
2,5
3,0
3,5
4,0
4,5
5,0
5,5
6,0
6,5
0,1
0,2
0,3
0,4
0,5
0,6
0,7
0,8
0,0
0,1
0,2
0,3
0,4
0,5
0,6
-0, 1
0, 0
0, 1
0, 2
0, 3
0, 4
0, 5
-0,005
0,000
0,005
0,010
0,015
0,020
0,025
0,030
0,035
0,040
Clarinet Clarinet Clarinet Clarinet
Clarinet
Cello Cello Cello Cello
Cello
Trumpet Trumpet Trumpet Trumpet
Trumpet
En Horn En Horn En Horn En Horn
En Horn
Flute Flute Flute Flute
Flute
Fr Horn Fr Horn Fr Horn Fr Horn
Fr Horn
Marimba Marimba Marimba Marimba
Marimba
0,0
0,1
0,2
0,3
0,4
0,5
0,6
0,01
0,02
0,03
0,04
0,05
0,06
0,07
0,08
0,09
0,10
0,11
-1E 7
0
1E7
2E7
3E7
4E7
5E7
6E7
7E7
8E7
9E7
-0,02
0,00
0,02
0,04
0,06
0,08
0,10
0,12
0,14
0,16
0,18
0,20
0,22
0,0
0,1
0,2
0,3
0,4
0,5
0,6
0,0
0,1
0,2
0,3
0,4
0,5
0,6
-0,1
0,0
0,1
0,2
0,3
0,4
0,5
0,6
-0,1
0,0
0,1
0,2
0,3
0,4
0,5
0,6
flat 15 flat 15
flat 13 flat 11
0,1
0,2
0,3
0,4
0,5
0,6
0,7
-0,1
0,0
0,1
0,2
0,3
0,4
0,5
0,6
-0,1
0,0
0,1
0,2
0,3
0,4
0,5
0,6
Fig. 10 (continued)
Fr Horn Fr Horn Fr Horn Fr Horn
Instrument flat25 1 2 3 4 5 6 7
ctrumpet 0.768871 x
oboe 0.825799 x
violin 0.847598 x x
viola 0.852023 x x
cello 0.854737 x
englishhorn 0.861977 xx
bflatclarinet 0.887746 xx
tubularbells 0.895093 xx
tenorTrombone 0.898937 xx
frenchhorn 0.903252 xxx
marimba 0.906131 xxx
flute 0.915134 xx
vibraphone 0.918556 xx
piano 0.926591 x
Fig. 11 (continued)
1-element groups), and, to a lesser degree, piano (Fig. 11). For these features
( f lat10 , . . . , f lat14 ), some 1-element groups are produced, and for the next features
from the AudioSpectrumFlatness set, the size of the homogenous groups is growing.
To be more precise, for increasing i in f lati , the group consisting of marimba,
vibraphone, and French horn was growing—other instruments were added. At the
same time, homogeneous group determined by oboe, clarinet, trumpet, violin, and
English horn was differentiating into separate groups.
AudioSpectrumFlatness is the biggest feature set analyzed here. High discrimi-
native power of spectral flatness is confirmed by the results shown in Fig. 11, since
in many cases 1-element groups are created wrt. particular elements of the flatness
feature vector. This illustrates high descriptive power of the shape of the spectrum,
represented here by the spectrum flatness.
In this paper, we compared feature sets used for musical instrument sound clas-
sification. Mean values for data representing given instruments and statistical tests
for these data were presented and discussed. Also, for each feature, homogeneous
groups were found, representing instruments which are similar with respect to this
J Intell Inf Syst (2011) 37:293–314 313
feature. Instruments, for which the mean values of a given feature were significantly
different, were assigned to different groups, and instruments, for which the mean
values were not statistically different, were assigned to the same group.
Sound features were grouped according to the parameterization method, in-
cluding MFCC, proportions of harmonics in the sound spectrum, and MPEG-7
based parameters (AudioSpectrumFlatness, AudioSpectrumBasis). These groups
were chosen as a conclusion of our previous research, indicating high discriminant
power of particular features for instrument discrimination purposes.
Piano, vibraphone, marimba, cello, English horn, French horn, and trombone
turned out to be the most discernible instruments. It is very encouraging, because
marimba and vibraphone represent idiophones (a part of percussion group), so sound
is played by striking and is not sustained (similarly for piano), so there is no steady
state, thus making parameterization more challenging. Also, since the investigations
were performed for small groups of features (up to 16), we conclude that these groups
constitute a good basis for instrument discernment.
The results enabled us to indicate, for each instrument, which parameters within a
given group represent the highest distinguishing power, and indicate which features
are most suitable to distinguish this instrument. Following the earlier research based
on sound features described here and SVM classifiers (Wieczorkowska et al. 2008),
experiments on automatic musical instrument identification were also performed
using random forests as classifiers (Kursa et al. 2009). The obtained results confirmed
significance of particular features, and yielded very good accuracy.
Acknowledgements The presented work was partially supported by the Research Center of PJIIT,
supported by the Polish National Committee for Scientific Research (KBN).
The authors would like to thank Elżbieta Kubera from the University of Life Sciences in Lublin
for help with preparing the initial data for experiments and improving the description of features.
Open Access This article is distributed under the terms of the Creative Commons Attribution
Noncommercial License which permits any noncommercial use, distribution, and reproduction in
any medium, provided the original author(s) and source are credited.
References
Glass, G. V., & Hopkins, K. D. (1995). Statistical methods in education and psychology. Allyn &
Bacon.
Herrera, P., Amatriain, X., Batlle, E., & Serra, X. (2000). Towards instrument segmentation for music
content description: A critical review of instrument classification techniques. In International
symposium on music information retrieval ISMIR.
ISO/IEC JTC1/SC29/WG11 (2004). MPEG-7 overview. Available at http://www.chiariglione.org/
mpeg/standards/mpeg-7/mpeg-7.htm.
Itoyama, K., Goto, M., Komatani, K., Ogata, T., & Okuno, H. G. (2008). Instrument equalizer for
query-by-example retrieval: Improving sound source separation based on integrated harmonic
and inharmonic models. In 9th international conference on music information retrieval ISMIR.
Klapuri, A. (2004). Signal processing methods for the automatic transcription of music. Ph.D. thesis,
Tampere University of Technology, Finland.
Kursa, M., Rudnicki, W., Wieczorkowska, A., Kubera, E., & Kubik-Komar, A. (2009). Musical
instruments in random forest. In J. Rauch, Z. W. Ras, P. Berka, & T. Elomaa (Eds.), Foundations
of intelligent systems, 18th international symposium, ISMIS 2009, Prague, Czech Republic, 14–17
September 2009, Proceedings. LNAI 5722 (pp. 281–290). Berlin Heidelberg: Springer-Verlag.
Lindman, H. R. (1974). Analysis of variance in complex experimental designs. San Francisco: W. H.
Freeman & Co.
Little, D., & Pardo, B. (2008). Learning musical instruments from mixtures of audio with weak labels.
In 9th international conference on music information retrieval ISMIR.
Logan, B. (2000). Mel frequency cepstral coefficients for music modeling. In International symposium
on music information retrieval MUSIC IR.
Morrison, D. F. (1990). Multivariate statistical methods (3rd ed.). New York: McGraw Hill.
NSF (2010). Automatic indexing of audio with timbre information for musical instruments of def inite
pitch. http://www.mir.uncc.edu/.
Opolko, F., & Wapnick, J. (1987). MUMS—McGill University master samples. CD’s.
Rao, C. R. (1951). An asymptotic expansion of the distribution of Wilks’ criterion. Bulletin of the
International Statistical Institute, 33, 177–181.
StatSoft, Inc. (2001). STATISTICA, version 6. http://www.statsoft.com/.
Tukey, J. W. (1993). The problem of multiple comparisons. Multiple comparisons: 1948–1983. In
H. I. Braun (Ed.), The collected works of John W. Tukey (vol. VIII, pp. 1–300). Chapman Hall.
Viste, H., & Evangelista, G. (2003). Separation of harmonic instruments with overlapping partials
in multi-channel mixtures. In IEEE workshop on applications of signal processing to audio and
acoustics WASPAA-03, New Paltz, NY.
Wieczorkowska, A., & Czyzewski, A. (2003). Rough set based automatic classification of musical
instrument sounds. In International workshop on rough sets in knowledge discovery and soft
computing RSKD. Warsaw, Poland: Elsevier.
Wieczorkowska, A. A., & Kubera, E. (2009). Identif ication of a dominating instrument in polytimbral
same-pitch mixes using SVM classif iers with non-linear kernel. Journal of Intelligent Information
Systems. doi:10.1007/s10844-009-0098-3.
Wieczorkowska, A., Kubera, E., & Kubik-Komar, A. (2008). Analysis of recognition of a musical in-
strument in sound mixes using support vector machines. In H. S. Nguyen & V.-N. Huynh (Eds.),
SCKT-08: Soft computing for knowledge technology workshop, Hanoi, Vietnam, December 2008,
proceedings. Tenth pacif ic rim international conference on artif icial intelligence PRICAI 2008
(pp. 110–121).
Wieczorkowska, A., & Kubik-Komar, A. (2009a). Application of analysis of variance to assess-
ment of influence of sound feature groups on discrimination between musical instruments. In:
J. Rauch, Z. W. Ras, P. Berka, & T. Elomaa (Eds.), Foundations of intelligent systems,
18th international symposium, ISMIS 2009, Prague, Czech Republic, proceedings. LNAI 5722
(pp. 291–300). Berlin Heidelberg: Springer-Verlag.
Wieczorkowska, A., & Kubik-Komar, A. (2009b). Application of discriminant analysis to distinction
of musical instruments on the basis of selected sound parameters. In: K. A. Cyran, S. Kozielski,
J. F. Peters, U. Stanczyk, & A. Wakulicz-Deja (Eds.), Man-machine interactions. Advances in
intelligent and soft computing (Vol. 59, pp. 407–416). Berlin Heidelberg: Springer-Verlag.
Winer, B. J., Brown, D. R., & Michels, K. M. (1991). Statistical principals in experimental design (3rd
ed.). New York: McGraw-Hill.
Zhang, X. (2007). Cooperative music retrieval based on automatic indexing of music by instruments
and their types. Ph.D thesis, Univ. North Carolina, Charlotte.