Skip to main content

Xavier Serra

Pompeu Fabra University, Department of Information and Communication Technologies (DTIC), Faculty Member

Pompeu Fabra University, Communication and Information Technologies, Faculty Member

Followers

411

Following

19

Co-authors

12

Public Views

InterestsView All (6)

Uploads

Papers by Xavier Serra

PARSHL: an analysis/synthesis program for non-harmonic sounds based on a sinusoidal representation

Proc. Int. Computer Music Conf, Aug 1, 1987

This paper describes a peak-tracking spectrum analyzer, called Parshl, which is useful for extrac... more This paper describes a peak-tracking spectrum analyzer, called Parshl, which is useful for extracting additive synthesis parameters from inharmonic sounds such as the piano. Parshl is based on the Short-Time Fourier Transform (STFT), adding features for tracking the amplitude, frequency, and phase trajectories of spectral lines from one FFT to the next. Parshl can be thought of as an “inharmonic phase vocoder” which uses tracking vocoder analysis channels instead of a fixed harmonic filter bank as used in previous FFT- ...

Assessing The Tuning Of Sung Indian Classical Music

The issue of tuning in Indian classical music has been, historically, a matter of theoretical deb... more The issue of tuning in Indian classical music has been, historically, a matter of theoretical debate. In this paper, we study its contemporary practice in sung performances of Carnatic and Hindustani music following an empiric and quantitative approach. To do so, we select stable fundamental frequencies, estimated via a standard algorithm, and construct interval histograms from a pool of recordings. We then compare such histograms against the ones obtained for different music sources and against the theoretical values derived from 12-note just intonation and equal temperament. Our results evidence that the tunings in Carnatic and Hindustani music differ, the former tending to a just intonation system and the latter having much equal-tempered influences. Carnatic music also presents signs of a more continuous distribution of pitches. Further subdivisions of the octave are partially investigated, finding no strong evidence of them.

Phrase-based rĀga recognition using vector space modeling

2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2016

mining of indian music by extracting arohana-avarohana pattern, " Int.

Audio feature extraction for exploring Turkish makam music

Comunicacio presentada a: ATMM 2014 The 3rd International Conference on Audio Technologies for Mu... more

A Corpus for Computational Research of Turkish Makam Music

Proceedings of the 1st International Workshop on Digital Libraries for Musicology, 2014

Each music tradition has its own characteristics in terms of melodic, rhythmic and timbral proper... more Each music tradition has its own characteristics in terms of melodic, rhythmic and timbral properties as well as semantic understandings. To analyse, discover and explore these culture-specific characteristics, we need music collections which are representative of the studied aspects of the music tradition. For Turkish makam music, there are various resources available such as audio recordings, music scores, lyrics and editorial metadata. However, most of these resources are not typically suited for computational analysis, are hard to access, do not have su cient quality or do not include adequate descriptive information. In this paper we present a corpus of Turkish makam music created within the scope of the CompMusic project. The corpus is intended for computational research and the primary considerations during the creation of the corpus reflect some criteria, namely, purpose, coverage, completeness, quality and re-usability. So far, we have gathered approximately 6000 audio recordings, 2200 music scores with lyrics and 27000 instances of editorial metadata related to Turkish makam music. The metadata include information about makams, recordings, scores, compositions , artists etc. as well as the interrelations between them. In this paper, we also present several test datasets of Turkish makam music. Test datasets contain manual annotations by experts and they provide ground truth for specific computational tasks to test, calibrate and improve the research tools. We hope that this research corpus and the test datasets will facilitate academic studies in several fields such as music information retrieval and computational musicology.

SIMAC: Semantic interaction with music audio contents

Characterization of the Freesound online community

2012 3rd International Workshop on Cognitive Information Processing, CIP 2012, 2012

There are many online communities with membergenerated and openly available multimedia content. T... more There are many online communities with membergenerated and openly available multimedia content. Their success depends on having active contributing users and on producing useful content. With this criterion, the community of sound practitioners that has emerged in Freesound is a successful case of interest to be studied. But to understand it and support it further we need an appropriate analysis methodology. In this paper we propose some qualitative and quantitative approaches for its characterization, focusing on the analysis of organizational structure, shared goals, user interactions and vocabulary sharing. We think that the proposed approach can be applied to other online communities with similar characteristics.

Small world networks and creativity in audio clip sharing

International Journal of Social Network Mining, 2012

Sharing communities are changing the way audio clips are obtained in several areas, ranging from ... more Sharing communities are changing the way audio clips are obtained in several areas, ranging from music to game design. The motivations for people to record and upload sounds to these sites are likely to be related to social factors. In this paper we describe several Perfecto Herrera received the degree in psychology from the University of Barcelona, Barcelona, Spain, in 1987. He was with the University of Barcelona as a Software Developer and as assistant professor. His further studies focused on sound engineering, audio postproduction, and computer music. Now he is finishing his PhD on music content processing in the Universitat Pompeu Fabra (UPF), Barcelona. He has been working with the Music Technology Group, (UPF) since its inception in 1996, first as responsible for the sound laboratory/studio, then as a researcher. He worked in the MPEG-7 standardization initiative from 1999 to 2001. Then, he collaborated in the EU-IST-funded CUIDADO project, contributing to the research and development of tools for indexing and retrieving music and sound collections. This work was somehow continued and expanded as scientific coordinator for the Semantic Interaction with Music Audio Contents (SIMAC) project, again funded by the EU-IST. He is currently the Head of the Department of Sonology, Higher Music School of Catalonia (ESMUC), where he teaches music technology and psychoacoustics. His main research interests are music content processing, classification, and music perception and cognition. Massimiliano Zanin received the degree in Aeronautical Management from the

Music perception with current signal processing strategies for cochlear implants

Proceedings of the 4th International Symposium on Applied Sciences in Biomedical and Communication Technologies, 2011

This work presents a brief review on hearing with cochlear implants with emphasis on music percep... more This work presents a brief review on hearing with cochlear implants with emphasis on music perception. Although speech perception in noise with cochlear implants is still the major challenge, music perception is becoming more and more important. Music can modulate emotions and stimulate the brain in different ways than speech, for this reason, music can impact in quality of life for cochlear implant users. In this paper we present traditional and new trends to improve the perception of pitch with cochlear implants as well as some signal processing methods that have been designed with the aim to improve music perception. Finally, a review of music evaluation methods will be presented.

A Data Mining Approach to Expressive Music Performance Modeling

Multimedia Data Mining and Knowledge Discovery

In this chapter we present a data mining approach to one of the most challenging aspects of compu... more In this chapter we present a data mining approach to one of the most challenging aspects of computer music: modeling the knowledge applied by a musician when performing a score in order to produce an expressive performance of a piece. We apply data mining techniques to real performance data (i.e., audio recordings) in order to induce an expressive performance model. This leads to an expressive performance system consisting of three components: (1) a melodic transcription component that extracts a set of acoustic features from the audio recordings, (2) a data mining component that induces an expressive transformation model from the set of extracted acoustic features, and (3) a melody synthesis component that generates expressive monophonic output (MIDI or audio) from inexpressive melody descriptions using the induced expressive transformation model. We describe, explore, and compare different data mining techniques for inducing the expressive transformation model.

Computational models of music perception and cognition II: Domain-specific music processing

Physics of Life Reviews, 2008

Analysis/synthesis comparison

Organised Sound, 2000

We compared six sound analysis/synthesis systems used for computer music. Each system analysed th... more We compared six sound analysis/synthesis systems used for computer music. Each system analysed the same collection of twenty-seven varied input sounds, and output the results in Sound Description Interchange Format (SDIF). We describe each system individually then compare the systems in terms of availability, the sound model(s) they use, interpolation models, noise modelling, the mutability of various sound models, the parameters that must be set to perform analysis, and characteristic artefacts. Although we have not directly compared the analysis results among the different systems, our work has made such a comparison possible.

Acoustic Cues to Beat Induction: A Machine Learning Perspective

Music Perception, 2006

This article brings forward the question of which acoustic features are the most adequate for ide... more This article brings forward the question of which acoustic features are the most adequate for identifying beats computationally in acoustic music pieces. We consider many different features computed on consecutive short portions of acoustic signal, among which those currently promoted in the literature on beat induction from acoustic signals and several original features, unmentioned in this literature. Evaluation of feature sets regarding their ability to provide reliable cues to the localization of beats is based on a machine learning methodology with a large corpus of beat-annotated music pieces, in audio format, covering distinctive music categories. Confirming common knowledge, energy is shown to be a very relevant cue to beat induction (especially the temporal variation of energy in various frequency bands, with the special relevance of frequency bands below 500 Hz and above 5 kHz). Some of the new features proposed in this paper are shown to outperform features currently prom...

Transmission Two: The Great Excursion (TT:TGE)": The Aesthetic, Art and Science of a Composition for Radio

Leonardo Music Journal, 1991

Sonic composition conceived to be heard through loudspeakers is one of the unique and original ar... more Sonic composition conceived to be heard through loudspeakers is one of the unique and original art forms of the second half of the twentieth century. Concert halls were assumed at the beginning to be the proper venues for presenting this kind of work, but it was not long before composers began searching for other options. Morton Subotnick, for instance, was among the first to try subverting such traditional presentation by composing an electronic work (SilverAPgles oftheMoon [ 19671) thatwould be available only on a long-playing phonograph recording [l]. This piece was intended to be played on a normal high-fidelity system in listeners' homes and in no other way. In the past decade another idea for bringing original sonic art directly into the home-again, work intended for only this means of dissemination-has begun to flourish. This is Hiirpiel, a sonic art form that has developed into something truly indigenous to radio. Hiirspiel, or 'listen play', shortened from the term Neues Hiirspkl, was coined by German radio producer Klaus Schoening, who, since the late 1960s, has created radio plays by combining sounds and words in nontraditional collages. Schoening has since broadened the concept, now calling it .am acustica. It is, as he says, "for ears and imagination" [2]. As the name suggests, there has been a particular flurry of such activity in Germany, but there are now composers and other artists involved with sound and theater who are working on similar projects all over the world. Larly Austin (composer, writer, teacher), 2109 Woodbrook, Denton, TX 76205, U.S.A. Charles b o n e (composer, writer),

Detecting Solo Phrases in Music Using Spectral and Pitch-related Descriptors

Journal of New Music Research, 2009

In this paper we present an algorithm for segmenting musical audio data. Our aim is to identify s... more In this paper we present an algorithm for segmenting musical audio data. Our aim is to identify solo instrument phrases in polyphonic music. We extract relevant features from the audio to be input into our algorithm. A large corpus of audio descriptors was tested for its ability to discriminate between solo and non-solo sections, which resulted in a subset of five best features. We derived a two-stage algorithm that first creates a set of boundary candidates from local changes of these features and then classifies fixed-length segments according to the desired target classes. The output of the two stages is combined to derive the final segmentation and segment labels. Our system was trained and tested with excerpts from classical pieces and evaluated using full-length recordings, all taken from commercially available audio. We evaluated our algorithm by using precision and recall measurements for the boundary estimation and introduced new evaluation metrics from image processing for the final segmentation. Along with a resulting accuracy of 77%, we demonstrate that the selected features are discriminative for this specific task and achieve reasonable results for the segmentation problem.

Statistical Analysis of Chroma Features in Western Music Predicts Human Judgments of Tonality

Journal of New Music Research, 2008

Motivated by evidence that image source statistics predict the response properties of several vis... more Motivated by evidence that image source statistics predict the response properties of several visual perception aspects, we provide an empirical analysis of the relation between chroma statistics and human judgments of tonality. To accomplish this, a statistical analysis method based on chroma feature covariance is proposed. It makes use of a large collection of western music to build a tonal profile. The obtained profile is compared to alternative tonal profiles proposed in the literature, either cognitively, perceptually, or theoretically inspired. The high degree of correlation we find between the covariance-based tonal profile proposed here and several ones proposed in the literature (reaching values higher than 0.9) is interpreted as evidence that human-derived profiles faithfully reflect the statistics of the musical input listeners have been exposed to. Furthermore, we show that very short time scales allow us to correctly predict these profiles, which brings us to discuss the role that local-scale implicit learning plays in building mental representations of tonality.

Synthesis of the Singing Voice by Performance Sampling and Spectral Models

IEEE Signal Processing Magazine, 2007

Performance-Based Interpreter Identification in Saxophone Audio Recordings

IEEE Transactions on Circuits and Systems for Video Technology, 2007

Predictability of Music Descriptor Time Series and its Application to Cover Song Detection

IEEE Transactions on Audio, Speech, and Language Processing, 2011

Intuitively, music has both predictable and unpredictable components. In this work we assess this... more Intuitively, music has both predictable and unpredictable components. In this work we assess this qualitative statement in a quantitative way using common time series models fitted to state-of-the-art music descriptors. These descriptors cover different musical facets and are extracted from a large collection of real audio recordings comprising a variety of musical genres. Our findings show that music descriptor time series exhibit a certain predictability not only for short time intervals, but also for mid-term and relatively long intervals. This fact is observed independently of the descriptor, musical facet and time series model we consider. Moreover, we show that our findings are not only of theoretical relevance but can also have practical impact. To this end we demonstrate that music predictability at relatively long time intervals can be exploited in a real-world application, namely the automatic identification of cover songs (i.e. different renditions or versions of the same musical piece). Importantly, this prediction strategy yields a parameter-free approach for cover song identification that is substantially faster, allows for reduced computational storage and still maintains highly competitive accuracies when compared to state-of-the-art systems.

Chroma Binary Similarity and Local Alignment Applied to Cover Song Identification

IEEE Transactions on Audio, Speech, and Language Processing, 2008

We present a new technique for audio signal comparison based on tonal subsequence alignment and i... more We present a new technique for audio signal comparison based on tonal subsequence alignment and its application to detect cover versions (i.e., different performances of the same underlying musical piece). Cover song identification is a task whose popularity has increased in the Music Information Retrieval (MIR) community along in the past, as it provides a direct and objective way to evaluate music similarity algorithms. This article first presents a series of experiments carried out with two state-of-the-art methods for cover song identification. We have studied several components of these (such as chroma resolution and similarity, transposition, beat tracking or Dynamic Time Warping constraints), in order to discover which characteristics would be desirable for a competitive cover song identifier. After analyzing many cross-validated results, the importance of these characteristics is discussed, and the best-performing ones are finally applied to the newly proposed method. Multiple evaluations of this one confirm a large increase in identification accuracy when comparing it with alternative state-of-the-art approaches.

PARSHL: an analysis/synthesis program for non-harmonic sounds based on a sinusoidal representation

Proc. Int. Computer Music Conf, Aug 1, 1987

This paper describes a peak-tracking spectrum analyzer, called Parshl, which is useful for extrac... more This paper describes a peak-tracking spectrum analyzer, called Parshl, which is useful for extracting additive synthesis parameters from inharmonic sounds such as the piano. Parshl is based on the Short-Time Fourier Transform (STFT), adding features for tracking the amplitude, frequency, and phase trajectories of spectral lines from one FFT to the next. Parshl can be thought of as an “inharmonic phase vocoder” which uses tracking vocoder analysis channels instead of a fixed harmonic filter bank as used in previous FFT- ...

Assessing The Tuning Of Sung Indian Classical Music

The issue of tuning in Indian classical music has been, historically, a matter of theoretical deb... more The issue of tuning in Indian classical music has been, historically, a matter of theoretical debate. In this paper, we study its contemporary practice in sung performances of Carnatic and Hindustani music following an empiric and quantitative approach. To do so, we select stable fundamental frequencies, estimated via a standard algorithm, and construct interval histograms from a pool of recordings. We then compare such histograms against the ones obtained for different music sources and against the theoretical values derived from 12-note just intonation and equal temperament. Our results evidence that the tunings in Carnatic and Hindustani music differ, the former tending to a just intonation system and the latter having much equal-tempered influences. Carnatic music also presents signs of a more continuous distribution of pitches. Further subdivisions of the octave are partially investigated, finding no strong evidence of them.

Phrase-based rĀga recognition using vector space modeling

2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2016

mining of indian music by extracting arohana-avarohana pattern, " Int.

Audio feature extraction for exploring Turkish makam music

Comunicacio presentada a: ATMM 2014 The 3rd International Conference on Audio Technologies for Mu... more

A Corpus for Computational Research of Turkish Makam Music

Proceedings of the 1st International Workshop on Digital Libraries for Musicology, 2014

Each music tradition has its own characteristics in terms of melodic, rhythmic and timbral proper... more Each music tradition has its own characteristics in terms of melodic, rhythmic and timbral properties as well as semantic understandings. To analyse, discover and explore these culture-specific characteristics, we need music collections which are representative of the studied aspects of the music tradition. For Turkish makam music, there are various resources available such as audio recordings, music scores, lyrics and editorial metadata. However, most of these resources are not typically suited for computational analysis, are hard to access, do not have su cient quality or do not include adequate descriptive information. In this paper we present a corpus of Turkish makam music created within the scope of the CompMusic project. The corpus is intended for computational research and the primary considerations during the creation of the corpus reflect some criteria, namely, purpose, coverage, completeness, quality and re-usability. So far, we have gathered approximately 6000 audio recordings, 2200 music scores with lyrics and 27000 instances of editorial metadata related to Turkish makam music. The metadata include information about makams, recordings, scores, compositions , artists etc. as well as the interrelations between them. In this paper, we also present several test datasets of Turkish makam music. Test datasets contain manual annotations by experts and they provide ground truth for specific computational tasks to test, calibrate and improve the research tools. We hope that this research corpus and the test datasets will facilitate academic studies in several fields such as music information retrieval and computational musicology.

SIMAC: Semantic interaction with music audio contents

Characterization of the Freesound online community

2012 3rd International Workshop on Cognitive Information Processing, CIP 2012, 2012

There are many online communities with membergenerated and openly available multimedia content. T... more There are many online communities with membergenerated and openly available multimedia content. Their success depends on having active contributing users and on producing useful content. With this criterion, the community of sound practitioners that has emerged in Freesound is a successful case of interest to be studied. But to understand it and support it further we need an appropriate analysis methodology. In this paper we propose some qualitative and quantitative approaches for its characterization, focusing on the analysis of organizational structure, shared goals, user interactions and vocabulary sharing. We think that the proposed approach can be applied to other online communities with similar characteristics.

Small world networks and creativity in audio clip sharing

International Journal of Social Network Mining, 2012

Sharing communities are changing the way audio clips are obtained in several areas, ranging from ... more Sharing communities are changing the way audio clips are obtained in several areas, ranging from music to game design. The motivations for people to record and upload sounds to these sites are likely to be related to social factors. In this paper we describe several Perfecto Herrera received the degree in psychology from the University of Barcelona, Barcelona, Spain, in 1987. He was with the University of Barcelona as a Software Developer and as assistant professor. His further studies focused on sound engineering, audio postproduction, and computer music. Now he is finishing his PhD on music content processing in the Universitat Pompeu Fabra (UPF), Barcelona. He has been working with the Music Technology Group, (UPF) since its inception in 1996, first as responsible for the sound laboratory/studio, then as a researcher. He worked in the MPEG-7 standardization initiative from 1999 to 2001. Then, he collaborated in the EU-IST-funded CUIDADO project, contributing to the research and development of tools for indexing and retrieving music and sound collections. This work was somehow continued and expanded as scientific coordinator for the Semantic Interaction with Music Audio Contents (SIMAC) project, again funded by the EU-IST. He is currently the Head of the Department of Sonology, Higher Music School of Catalonia (ESMUC), where he teaches music technology and psychoacoustics. His main research interests are music content processing, classification, and music perception and cognition. Massimiliano Zanin received the degree in Aeronautical Management from the

Music perception with current signal processing strategies for cochlear implants

Proceedings of the 4th International Symposium on Applied Sciences in Biomedical and Communication Technologies, 2011

This work presents a brief review on hearing with cochlear implants with emphasis on music percep... more This work presents a brief review on hearing with cochlear implants with emphasis on music perception. Although speech perception in noise with cochlear implants is still the major challenge, music perception is becoming more and more important. Music can modulate emotions and stimulate the brain in different ways than speech, for this reason, music can impact in quality of life for cochlear implant users. In this paper we present traditional and new trends to improve the perception of pitch with cochlear implants as well as some signal processing methods that have been designed with the aim to improve music perception. Finally, a review of music evaluation methods will be presented.

A Data Mining Approach to Expressive Music Performance Modeling

Multimedia Data Mining and Knowledge Discovery

In this chapter we present a data mining approach to one of the most challenging aspects of compu... more In this chapter we present a data mining approach to one of the most challenging aspects of computer music: modeling the knowledge applied by a musician when performing a score in order to produce an expressive performance of a piece. We apply data mining techniques to real performance data (i.e., audio recordings) in order to induce an expressive performance model. This leads to an expressive performance system consisting of three components: (1) a melodic transcription component that extracts a set of acoustic features from the audio recordings, (2) a data mining component that induces an expressive transformation model from the set of extracted acoustic features, and (3) a melody synthesis component that generates expressive monophonic output (MIDI or audio) from inexpressive melody descriptions using the induced expressive transformation model. We describe, explore, and compare different data mining techniques for inducing the expressive transformation model.

Computational models of music perception and cognition II: Domain-specific music processing

Physics of Life Reviews, 2008

Analysis/synthesis comparison

Organised Sound, 2000

We compared six sound analysis/synthesis systems used for computer music. Each system analysed th... more We compared six sound analysis/synthesis systems used for computer music. Each system analysed the same collection of twenty-seven varied input sounds, and output the results in Sound Description Interchange Format (SDIF). We describe each system individually then compare the systems in terms of availability, the sound model(s) they use, interpolation models, noise modelling, the mutability of various sound models, the parameters that must be set to perform analysis, and characteristic artefacts. Although we have not directly compared the analysis results among the different systems, our work has made such a comparison possible.

Acoustic Cues to Beat Induction: A Machine Learning Perspective

Music Perception, 2006

This article brings forward the question of which acoustic features are the most adequate for ide... more This article brings forward the question of which acoustic features are the most adequate for identifying beats computationally in acoustic music pieces. We consider many different features computed on consecutive short portions of acoustic signal, among which those currently promoted in the literature on beat induction from acoustic signals and several original features, unmentioned in this literature. Evaluation of feature sets regarding their ability to provide reliable cues to the localization of beats is based on a machine learning methodology with a large corpus of beat-annotated music pieces, in audio format, covering distinctive music categories. Confirming common knowledge, energy is shown to be a very relevant cue to beat induction (especially the temporal variation of energy in various frequency bands, with the special relevance of frequency bands below 500 Hz and above 5 kHz). Some of the new features proposed in this paper are shown to outperform features currently prom...

Transmission Two: The Great Excursion (TT:TGE)": The Aesthetic, Art and Science of a Composition for Radio

Leonardo Music Journal, 1991

Sonic composition conceived to be heard through loudspeakers is one of the unique and original ar... more Sonic composition conceived to be heard through loudspeakers is one of the unique and original art forms of the second half of the twentieth century. Concert halls were assumed at the beginning to be the proper venues for presenting this kind of work, but it was not long before composers began searching for other options. Morton Subotnick, for instance, was among the first to try subverting such traditional presentation by composing an electronic work (SilverAPgles oftheMoon [ 19671) thatwould be available only on a long-playing phonograph recording [l]. This piece was intended to be played on a normal high-fidelity system in listeners' homes and in no other way. In the past decade another idea for bringing original sonic art directly into the home-again, work intended for only this means of dissemination-has begun to flourish. This is Hiirpiel, a sonic art form that has developed into something truly indigenous to radio. Hiirspiel, or 'listen play', shortened from the term Neues Hiirspkl, was coined by German radio producer Klaus Schoening, who, since the late 1960s, has created radio plays by combining sounds and words in nontraditional collages. Schoening has since broadened the concept, now calling it .am acustica. It is, as he says, "for ears and imagination" [2]. As the name suggests, there has been a particular flurry of such activity in Germany, but there are now composers and other artists involved with sound and theater who are working on similar projects all over the world. Larly Austin (composer, writer, teacher), 2109 Woodbrook, Denton, TX 76205, U.S.A. Charles b o n e (composer, writer),

Detecting Solo Phrases in Music Using Spectral and Pitch-related Descriptors

Journal of New Music Research, 2009

In this paper we present an algorithm for segmenting musical audio data. Our aim is to identify s... more In this paper we present an algorithm for segmenting musical audio data. Our aim is to identify solo instrument phrases in polyphonic music. We extract relevant features from the audio to be input into our algorithm. A large corpus of audio descriptors was tested for its ability to discriminate between solo and non-solo sections, which resulted in a subset of five best features. We derived a two-stage algorithm that first creates a set of boundary candidates from local changes of these features and then classifies fixed-length segments according to the desired target classes. The output of the two stages is combined to derive the final segmentation and segment labels. Our system was trained and tested with excerpts from classical pieces and evaluated using full-length recordings, all taken from commercially available audio. We evaluated our algorithm by using precision and recall measurements for the boundary estimation and introduced new evaluation metrics from image processing for the final segmentation. Along with a resulting accuracy of 77%, we demonstrate that the selected features are discriminative for this specific task and achieve reasonable results for the segmentation problem.

Statistical Analysis of Chroma Features in Western Music Predicts Human Judgments of Tonality

Journal of New Music Research, 2008

Motivated by evidence that image source statistics predict the response properties of several vis... more Motivated by evidence that image source statistics predict the response properties of several visual perception aspects, we provide an empirical analysis of the relation between chroma statistics and human judgments of tonality. To accomplish this, a statistical analysis method based on chroma feature covariance is proposed. It makes use of a large collection of western music to build a tonal profile. The obtained profile is compared to alternative tonal profiles proposed in the literature, either cognitively, perceptually, or theoretically inspired. The high degree of correlation we find between the covariance-based tonal profile proposed here and several ones proposed in the literature (reaching values higher than 0.9) is interpreted as evidence that human-derived profiles faithfully reflect the statistics of the musical input listeners have been exposed to. Furthermore, we show that very short time scales allow us to correctly predict these profiles, which brings us to discuss the role that local-scale implicit learning plays in building mental representations of tonality.

Synthesis of the Singing Voice by Performance Sampling and Spectral Models

IEEE Signal Processing Magazine, 2007

Performance-Based Interpreter Identification in Saxophone Audio Recordings

IEEE Transactions on Circuits and Systems for Video Technology, 2007

Predictability of Music Descriptor Time Series and its Application to Cover Song Detection

IEEE Transactions on Audio, Speech, and Language Processing, 2011

Intuitively, music has both predictable and unpredictable components. In this work we assess this... more Intuitively, music has both predictable and unpredictable components. In this work we assess this qualitative statement in a quantitative way using common time series models fitted to state-of-the-art music descriptors. These descriptors cover different musical facets and are extracted from a large collection of real audio recordings comprising a variety of musical genres. Our findings show that music descriptor time series exhibit a certain predictability not only for short time intervals, but also for mid-term and relatively long intervals. This fact is observed independently of the descriptor, musical facet and time series model we consider. Moreover, we show that our findings are not only of theoretical relevance but can also have practical impact. To this end we demonstrate that music predictability at relatively long time intervals can be exploited in a real-world application, namely the automatic identification of cover songs (i.e. different renditions or versions of the same musical piece). Importantly, this prediction strategy yields a parameter-free approach for cover song identification that is substantially faster, allows for reduced computational storage and still maintains highly competitive accuracies when compared to state-of-the-art systems.

Chroma Binary Similarity and Local Alignment Applied to Cover Song Identification

IEEE Transactions on Audio, Speech, and Language Processing, 2008

We present a new technique for audio signal comparison based on tonal subsequence alignment and i... more We present a new technique for audio signal comparison based on tonal subsequence alignment and its application to detect cover versions (i.e., different performances of the same underlying musical piece). Cover song identification is a task whose popularity has increased in the Music Information Retrieval (MIR) community along in the past, as it provides a direct and objective way to evaluate music similarity algorithms. This article first presents a series of experiments carried out with two state-of-the-art methods for cover song identification. We have studied several components of these (such as chroma resolution and similarity, transposition, beat tracking or Dynamic Time Warping constraints), in order to discover which characteristics would be desirable for a competitive cover song identifier. After analyzing many cross-validated results, the importance of these characteristics is discussed, and the best-performing ones are finally applied to the newly proposed method. Multiple evaluations of this one confirm a large increase in identification accuracy when comparing it with alternative state-of-the-art approaches.

New technologies for jingju music research: Characterisation of role-types using computational tools

by Rafael Caro Repetto and Xavier Serra

The development of new computational technologies has brought a new face to music research, givin... more The development of new computational technologies has brought a new face to music research, giving birth to new disciplines, such as computational musicology and, more recently, computational ethnomusicology. One of the more significant contributions of these approaches is the incorporation of quantitative and statistical information, as well as the possibility of easily and effectively analysing large corpora. The CompMusic project, carried out in the Music Technology Group (Universitat Pompeu Fabra, Barcelona) and funded by the European Research Council, is developing computational tools for the analysis of different music traditions, including jingju.
Analyses of jingju music to date have consisted mostly either in qualitative, in-depth analyses of selected sample arias, or general explanations of theoretical principles. A remarkable case of use of statistical data for the enhancement of comparative analysis is Bell Yung's work on Cantonese opera music (1989). In the framework of the CompMusic project, we aim to contribute to the development of a new face for Chinese music research by taking advantage of state of the art technologies in the fields of music information retrieval and sound and music computing.
In this paper we introduce firstly the research framework of the CompMusic project for bringing new technologies to traditional music research, focusing on the case of jingju. To demonstrate the applications of the computational tools developed within this framework, we present a series of algorithms that are being designed specifically for the analysis of jingju arias. These algorithms work both on scores in machine readable format and audio recordings, drawing respectively on the Music21 toolkit for score analysis and the Essentia library for audio analysis. For this paper, we explore the possibilities these tools offer for the characterisation of the singing style of three jingju role-types, namely dan, laosheng and jing. Finally, we discuss the preliminary results obtained from these analyses, pointing out the possibilities for further research, as well as the challenges to be addressed in future work.

Computational tools for assisting teaching of alien music traditions: The Musical Bridges project

by Rafael Caro Repetto and Xavier Serra

The development of information and communication technologies has granted worldwide access to mus... more The development of information and communication technologies has granted worldwide access to music digital traces (Serra 2017) of music traditions that are alien to the listener, but not necessarily to pathways for their comprehension and appreciation. On the other hand, courses on “world musics” are increasingly common in official curricula. These circumstances have fostered the publication of related educational materials. Most of these publications—such as World Sound Matters by Stock (1996) or the Global Music Series edited by Wade and Campbell (2003–2012), among many others—consist on textual explanations illustrated by audio recordings. This guided listening poses two important challenges to the learner, such us the difficulty of “hearing” the explanations on the audio, especially when confronted to an alien sonic world, and the limited number of music examples covered, whose detailed analysis and comprehension might not guarantee, or might even have an overfitting effect for, understanding and appreciating “real life” performances.
We argue that the discipline of Music Information Retrieval (MIR) offers opportunities to address the aforementioned challenges. MIR technologies can be used to develop tools for visualizing musical features, thus complementing “attentive listening” with visual cues, and for interacting with such features, which encourages “engaged listening” (as defined by Campbell 2004). Furthermore, these tools can be automatically implemented on large corpora of recordings, thus widening the music samples to which the learner is exposed. In this paper we introduce Musical Bridges, a just launched project which aims at developing such tools for assisting learning of alien music traditions. It draws on the corpora and technologies developed in the CompMusic project for Hindustani, Carnatic, Turkish makam, Arab-Andalusian and jingju music. We present the goals and methodology of the project, discuss its opportunities and challenges, and present the first prototypes for supporting Hindustani and jingju music teaching.

Computational tools for aiding appreciation of music cultures: Jingju as a test case

by Rafael Caro Repetto and Xavier Serra

The development of IT has had a dramatic impact in the dissemination of musical artefacts, especi... more The development of IT has had a dramatic impact in the dissemination of musical artefacts, especially audio recordings, across cultures. However, this increasing accessibility to large repertories of music recordings does not imply accessibility to their understanding and consequently to their appreciation. On the other hand, the young but continuously growing discipline of Music Information Retrieval (MIR) is creating computational tools for automatic analysis and content based management of large collections of music data. Putting these tools at the service of culturally aware music analysis might provide new platforms for aiding the understanding and appreciation of alien music cultures.
In this paper I explore the use of MIR tools for helping non specialized listeners understand and appreciate jingju (aka Peking opera) music. To this aim, I draw on the corpus for jingju music research created by the CompMusic project (http://compmusic.upf.edu) at the Music Technology Group (Universitat Pompeu Fabra, Barcelona), from which I select representative examples. According to the characteristics of this specific tradition, I either manually annotate or use the Essentia library to automatically extract corresponding features, which are then used for creating visualization tools using D3.js or Processing. These tools aim at guiding the listener’s attention towards the musical elements that are relevant for the appreciation of this tradition through visual cues. The final goal is to test their potential, if made available online, for helping the understanding and appreciation of jingju music, and therefore contribute to its global dissemination.

NACTA: construyendo el futuro de la tradición del jingju

by Rafael Caro Repetto and Xavier Serra

En sus poco más de doscientos años de historia, el jingju, u ópera de Pekín, ha conseguido el tít... more En sus poco más de doscientos años de historia, el jingju, u ópera de Pekín, ha conseguido el título de tesoro nacional (guocui) y como tal ha asumido la turbulenta historia reciente del país. En la puja por situarse como potencia internacional, China se esfuerza en promover este emblema cultural ante una audiencia global, consiguiendo de hecho haber sido incluido en la Lista Representativa del Patrimonio Cultural Inmaterial de la Humanidad, al tiempo que la audiencia local es cada vez mayor en edad, pero menor en número. En el centro de estas tensiones sociales y políticas, y en cierta medida ajena a las mismas, se encuentra la National Academy of Chinese Theatre Arts (Zhongguo Xiqu Xueyuan, NACTA), la principal institución de educación superior para actores, actrices, músicos, directores, compositores, escenógrafos y todo profesional relacionado con el jingju. Su cometido es el de preservar la tradición que ha de identificar el legado cultural con el que el país se presenta ante la comunidad global, la misma que se aleja cada vez más de su sociedad contemporánea. Como resultado del trabajo de campo desarrollado en este centro, esta ponencia estudia NACTA como espacio de negociación entre las diferentes dinámicas que operan sobre este género. A partir de entrevistas y conversaciones con alumnos y profesores, y de la participación directa en sus actividades formativas, los autores analizan los cambios en los modelos de formación y su repercusión en la transmisión / recreación de la tradición, así como la influencia de las políticas culturales, los continuos cambios sociales, la competencia con nuevos modelos de consumo de ocio, la (falta de) incorporación de las nuevas tecnologías y la idiosincrasia de las nuevas generaciones de jóvenes intérpretes frente a estos retos.

Melodic transformation processes in the arrangements of jingju banshi

by Rafael Caro Repetto and Xavier Serra

Music in jingju (also known as Peking or Beijing opera) is predefined by tradition. Before the ap... more Music in jingju (also known as Peking or Beijing opera) is predefined by tradition. Before the appearance of jingju music composers, actors used to “arrange melodies” (bianqu) for new lyrics according to melodic systems called shengqiang. Each shengqiang consists in a distinctive melodic framework which is transformed rhythmically into predefined metrical patterns called banshi to convey different emotions. If this is common knowledge in musicological literature and among performers, an analysis of how this melodic material is transformed is still to be undertaken. In this paper we present a preliminary approach to this topic, by implementing a computer aided comparative analysis. To this aim, we focus on three banshi in the xipi shengqiang as sang by the dan role-type, namely yuanban, which is considered to convey the “original” melody, manban, obtained by stretching yuanban, and kuaiban, a compression of yuanban. In order to ensure representativeness, we have gathered those arias quoted as example in several jingju music textbooks to build our dataset. We obtain a representation of the underlying melodic structure by comparing yuanban arias, search for trends in its transformation processes towards manban and kuaiban, and complement these results with statistical information computed from scores using the Music21 toolkit.