An interdisciplinary approach to audio to effect classification

Caroline  Traube; Catherine Guastavino

An interdisciplinary approach to audio to effect classification

Caroline Traube

Catherine Guastavino

visibility

…

description

7 pages

link

1 file

The aim of this paper is to propose an interdisciplinary classification of digital audio effects to facilitate communication and collaborations between DSP programmers, sound engineers, composers, performers and musicologists. After reviewing classifications reflecting technological, technical and perceptual points of view, we introduce a transverse classification to link disciplinespecific classifications into a single network containing various layers of descriptors, ranging from low-level features to high-level features. Simple tools using the interdisciplinary classification are introduced to facilitate the navigation between effects, underlying techniques, perceptual attributes and semantic descriptors. Finally, concluding remarks on implications for teaching purposes and for the development of audio effects user interfaces based on perceptual features rather than technical parameters are presented.

Proc. of the 9th Int. Conference on Digital Audio Effects (DAFx-06), Montreal, Canada, September 18-20, 2006 AN INTERDISCIPLINARY APPROACH TO AUDIO EFFECT CLASSIFICATION Vincent Verfaille Catherine Guastavino SPCL / CIRMMT, MGill University Montréal, Qc, Canada [email protected] GSLIS / CIRMMT, McGill University Montréal, Qc, Canada [email protected] Caroline Traube LIAM, Université de Montréal Montréal, Qc, Canada [email protected] ABSTRACT The aim of this paper is to propose an interdisciplinary classification of digital audio effects to facilitate communication and collaborations between DSP programmers, sound engineers, composers, performers and musicologists. After reviewing classifications reflecting technological, technical and perceptual points of view, we introduce a transverse classification to link disciplinespecific classifications into a single network containing various layers of descriptors, ranging from low-level features to high-level features. Simple tools using the interdisciplinary classification are introduced to facilitate the navigation between effects, underlying techniques, perceptual attributes and semantic descriptors. Finally, concluding remarks on implications for teaching purposes and for the development of audio effects user interfaces based on perceptual features rather than technical parameters are presented. The main motivation for an interdisciplinary approach to audio effect classification is to facilitate communication between researchers and creators working on or with audio effects (e.g. DSP programmers, sound engineers, sound designers, electroacoustic music composers, performers using augmented or extended acoustic instruments or digital instruments, musicologists). The concerned disciplines are acoustics, electrical engineering, psychoacoustics, music cognition and psycholinguistics. In the next section, vocabulary is clarified regarding audio effets vs sound effects and sound transformations. In section 3, we present the various standpoints on digital audio effects through a description of the communication chain in music. In section 4 we describe three discipline-specific classifications2 : based on underlying techniques, control signals and perceptual attributes. In section 5, we introduce an interdisciplinary classifications linking the different layers of domain-specific descriptors in section 5, before concluding with pedagogical and human computer interfaces remarks in section 6. 1. INTRODUCTION More than 70 different digital audio effects have previously been identified 1 ([1] and [2]). Digital audio effects are tools used by composers, performers and sound engineers, but are generally described from the standpoint of DSP engineers who designed them. They are therefore documented and classified in terms of the underlying techniques and technologies in both software documentations and textbooks. However, other classification schemes are commonly used by different communities. These include perceptual classification [3, 2], signal processing classification [4, 5, 6, 7, 1], control type classification [8], and sound and music computing classification [9]. The comparison of these classifications reveal strong differences. Specifically, each classification was introduced to best meet the needs of a specific audience, thus relying on features that are relevant for a given community, but may be meaningless or obscure for a different community. For example, signal processing techniques are rarely presented according to perceptual features but rather according to acoustical dimensions. Conversely, composers usually rely on perceptual or cognitive features rather than acoustical dimensions. 1 This value (70) is highly dependent on the degree of refinement as well as the experience and goals of the user, and therefore highly subjective! 2. SOME CLARIFICATION ON VOCABULARY Audio Effect or Sound Effect? The word ‘effect’ denotes an impression produced in the mind of person, a change in perception resulting from a cause. Sound effects and audio effects denote two related but different concepts. Sounds effects are sounds that affect us, whereas audio effects are transformations applied to sounds in order to modify how they affect us. Sound effect databases provide natural (recorded) and processed sounds (resulting from audio effects) that produce specific effects on perception used to simulate actions, interaction or emotions in various contexts (e.g. music, movie soundtracks). In the field of audio effects, the meaning of ‘effect’ has shifted from the perception of a change to the processing technique used to produce this change, reflecting a semantic confusion between what is perceived (the effect of processing) and the signal processing technique used to achieve this effect. Audio Effect or Sound Transformation? Distinctions between audio effects and sound transformations originated from technical 2 It should be noted that the presented classifications are not classifications in the strict sense of the term since they are neither mutually exclusive (one effect can be classified in more than one class), nor exhaustive. DAFX-107 Proc. of the 9th Int. Conference on Digital Audio Effects (DAFx-06), Montreal, Canada, September 18-20, 2006 and/or historical considerations that distinguishes simple vs. complex and surface vs. deep processing. Audio effects originally denoted simple processing systems based on simple mathematical operations, e.g. chorus by random control of delay line modulation; echo by a delay line; distorsion by non-linear processing. It was assumed that audio effects process sound at its surface, since sound is represented by wave form samples (not a high-level sound model) and simply processed by delay lines, filters, gains, etc. Sound transformations, on the other hand, denoted complex processing systems based on analysis–synthesis models (high-level sound models such as additive and substractive models). They were considered to offer deeper modifications, such as high-quality pitch-shifting with formant preservation, timbre morphing, and time-scaling with attack, pitch and panning preservation. Over time, practice blurred the boundaries between audio effects and sound transformations. Indeed, sound transformations can be used to produce the same changes as audio effects but often at a higher processing cost (ie. panning or comb filtering in the spectral domain). Moreover, some audio effects considered as simple processing actually require complex processing. For instance, reverberation systems are usually considered as simple audio effects because they were originally developed using simple operations, eventhough they apply complex sound transformations. The surface/depth distinction is also confusing because it can refer to either the technique used or to the effect it has on perception. For example, a distortion effect is a simple surface transformation from a technical point of view, but has a strong impact on perception and could therefore be described as a deep transformation from the perceptual standpoint. Conversely, a time-scaling with pitch, formants and attack preservation requires complex techniques but is perceived as a simple and clear transformation by the listener. Therefore, we consider that the terms ‘audio effects’, ‘sound transformations’ and ‘musical sound processing’ all refer to the same process: applying signal processing techniques to sounds in order to modify how they will be perceived, or in other words, to transform a sound into another sound with a different quality [10]. The different terms are often used interchangeably. For consistency sake, we will use ‘audio effects’ throughout this paper. 3. MULTIPLE STANDPOINTS ON AUDIO EFFECTS Despite the variety of needs and standpoints, the technological terminology is predominantly employed by the actual users of audio effects : composers and performers. This technological classification might be the most rigourous and systematic one but it unfortunately only refers to the techniques used while ignoring our perception of the resulting audio effects, which seems more relevant in a musical context. Instrument Maker Composer Score (aesthetic limits) Instrument (physical limits) Performer Sound Auditor Figure 1: Communication chain in music. The composer, performer and instrument maker are also listeners, but in a different context than the auditor. The communication chain in music [11, 12] essentially produces musical sounds. This concept has been adapted from linguistics and semiology to music [13], based on [14], in a tripartite semiological scheme distinguishing three levels of musical communication between a composer (producer) and a listener (receiver) through a physical, neutral trace such as sounds. In order to investigate all possible standpoints on audio effects, we apply this scheme to a complete chain, as depicted on Fig. 1, including all actors intervening in the processes of the conception, creation and perception of music (instrument makers, composers, performers and listeners). The poietic level concerns the conception and creation of a musical message to which instrument makers, composers and performers participate in different ways and at different stages. The neutral level is that of the physical ‘trace’ (instruments, sounds or scores). The aesthesic level corresponds to the perception and reception of the musical message by a listener. In the case of audio effects, the instrument maker is the signal processing engineer who designs the effect and the performer is the user of the effect (musician, sound engineer). In the context of mixed music creation, composers, performers and instrument makers (music technologists) are usually distinct individuals who need to efficiently communicate with one another. But all actors in the chain are also listeners who can share descriptions of what they hear and how they interpret it. Therefore we will consider the perceptual and cognitive standpoints as the entrance point to the proposed interdisciplinary network of the various domain-specific classifications. We also consider the specific case of electroacoustic music composers who often combine additional programming and performance skills. They conceive their own processing system, control and perform on their instruments. Although all production tasks are performed by a single multidisciplinary artist in this case, a tranverse classification is still helpful to achieve a better awareness of the relations between the different levels of description of an audio effect (from technical to perceptual standpoints). 4. DISCIPLINE-SPECIFIC CLASSIFICATIONS 4.1. Classification Based on Underlying Techniques The first classification we present is from the ‘instrument maker’ standpoint (DSP engineer or programmer). It is focused on the underlying techniques used to implement the audio effects. Many digitally implemented effects are in fact emulations of their analog ancestors3 . We distinguish the following analog technologies [2]: • mechanics/acoustics (e.g. musical instruments, effects due to room acoustics); • electromechanics (e.g. vinyls: pitch-shifting by changing disk rotation speed); • electromagnetics (e.g. magnetic tapes: flanging); • electronics (e.g. filters, vocoder, ring modulators). For example, flanging was originally obtained when pressing the thumb on the flange of a magnetophone wheel and is now emulated with digital comb filters with varying delays. Digital ring modulation (referring to the multiplication of two signals) borrows its name from the analog ring-shaped circuit of diodes originally used to implement this effect. Other digital effects emulating acoustical or perceptual properties of electromechanic, electric or electronic 3 Similarly, some analog audio effects implemented with one technique were emulating audio effects already existing with another analog technique. At some point, analog and/or digital techniques were also creatively used so as to provide new effects. DAFX-108 Proc. of the 9th Int. Conference on Digital Audio Effects (DAFx-06), Montreal, Canada, September 18-20, 2006 Time-Shuffle Chorus Length Mod. A-T-Scale + T-Sync. A-T-Scale, no T-Sync. Delay P-Shift, no SE Pres. P-Shift + SE Pres. Delay-Line Flanger Comb Filter Phase Unwarp Warp Shift A-Granular Delay TD-Resamp. cep./FD-LPC Flanger T-Scal. Equalizer Filters Spectrum Non Linear Modif. Phasing Scale (bad P-Shift) Wah-Wah A-T-Scale + T-Sync. Cross-Synthesis (FFT) T-Scale Harmonized Robot Robotization Robot with SE Modif. A-T-Scale, no T-Sync. SOLA P-Shift, no SE Pres. IFFT Whisperization FD Fx Warp Vibrato Block/Block TD Martianization P-Shift + SE Pres. PSOLA Shift Scale Harmonizer SE Modif. No Phase Unwarp Cross-Synthesis LPC Flatten / Inverse Resampling + Formants Pres. Resampling Contrast Equalizer by FFT Tremolo Amp. Mod. Amp. Spectrum Modif. Spectral Panning Ring Mod. + SE Pres. Ring Mod. Distorsion Time-Shuffle with SE Modif. Violoning Noise Gate Gain Spectral Tremolo Spectral Ring Modulation Osc. Bank Limiter Expander Compressor Power Panning Figure 2: A technical classification of audio effects used to design a multi-effects system. ‘TD’ stands for ‘time-domain’, ‘FD’ for ‘frequency domain’, ‘t-scale’ for ‘time-scaling’, ‘p-shift’ for ‘pitch-shifting’, ‘+’ for ‘with’, ‘A-’ for adaptive control’, ‘SE’ for ‘spectral envelope’, ‘osc’ for ‘oscillator’, ‘mod.’ for ‘modulation’ and ‘modif.’ for ‘modification’. Bold italic font words denote technical aspects, whereas regular font words denote audio effects. effects include filtering, the wah-wah effect, the vocoder effect, reverberation, echo and Leslie effect. Digital audio effects can be organized on the basis of implementation techniques, as proposed in [1]: • • • • • • • • • • filters; delays (resampling); modulators and demodulators; nonlinear processing; spatial effects; time-segment processing e.g. SOLA, PSOLA [15, 16]; time-frequency processing e.g. phase vocoder [17, 18]; source-filter processing (e.g. LPC); spectral processing (e.g. SMS [19]); time and frequency warping. Filter Warp Scale Signal Components Source Multiply Shift Identity |.| Interp Equalizer Vocoding Warp Scale Robotization Shift Ring-Mod. Pitch-Shift Gender Change As shown in this list, a sub-classification of digital audio effects can be based on the domain of application (time, frequency or time-frequency) and on whether the processing is performed sample-by-sample or block-by-block [2]: • time domain: block processing, e.g. OLA, SOLA, PSOLA; sample processing: e.g. delay line, gain, non linear processing, resampling/interpolation; • frequency domain (block processing): frequency domain synthesis (IFFT) e.g. phase vocoder with or without phase unwrapping; time domain synthesis (oscillator bank); • time and frequency domain: e.g. phase vocoder and resampling, phase vocoder and LPC. The advantage of a classification based on the underlying techniques is that the developer can easily see the technical and implementation similarities of various effects, thus simplifying understanding and implementation of multi-effect systems. However, some audio effects can be classified in more than one class. Spectral Panning Donald Duck Robotization Math. Operators Inharmonizer Audio Effects Figure 3: Technical classification of some source-filter effects based on signal components (source / filter) and applied mathematical operations (grey-shade background). Dashed bold lines and italic effect names indicate the use of adaptive control. For instance, time-scaling can be performed with time-segment and time-frequency processing. Depending on the user expertise (DSP programmer, electroacoustic composer), this classification may not be the easiest to understand. In fact, this type of classification does not explicitly handle perceptual features. Users may choose between several possible implementations of an effect depending on the artifacts of each effect. For instance, with time-scaling, resampling does not preserve pitch nor formants; OLA with circular buffer adds the window modulation and sounds rougher and filtered; phase vocoder sounds a bit reverberant, the ‘sinusoidal + noise’ additive model sounds good except for attacks, the ‘sinusoidal + transients + noise’ additive model preserves attacks, but not the spatial image of multi-channel sounds, etc. Therefore, in order to choose a technique, the user DAFX-109 Proc. of the 9th Int. Conference on Digital Audio Effects (DAFx-06), Montreal, Canada, September 18-20, 2006 must be aware of the audible artifact of each technique. The need to link implementation techniques to perceptual features (see 4.3) thus becomes clear. As depicted in Fig. 2, some audio effects (e.g. adaptive timescaling with time-synchronization [10]) can be performed with techniques from various domains (e.g. SOLA: block-by-block and time-domain; phase vocoder: block-by-block frequency domain with IFFT synthesis). This classification can be used to design a multi-effect system, as well as to provide an overview of technical domains and signal processing techniques involved in effects. However, it may not be relevant for the listener. Another classification relies on the mathematical operation applied to the components of the sound representation used to process the sound. For example, with source-filter model based audio effects, the basic mathematical operations that can be applied to the filter, the source or both components [20] are scaling, shifting, warping, multiplying, identity and, interpolation (see Fig. 3). DAFx name compressor, limiter expander, noise gate gain/amplification normalisation tremolo violoning inversion time-scaling w/ formant pres. time-scaling w/ attack pres. time-scaling w/ vibrato pres. rhythm/swing change pitch-shifting w/ formant pres. pitch-shifting, no formant pres. pitch change pitch discretization (auto-tune) smart harmony (in-)harmonizer distance change directivity Doppler echo granular delay panning reverberation spectral panning Rotary/Leslie filter arbitrary resolution filter comb filtering equalizer resonant filter wha-wha auto-wha envelope shifting envelope scaling envelope warping spectral centroid change chorus flanger phaser spectrum shifting adaptive ring modulation texture change distorsion fuzz overdrive mutation spectral interpolation vocoding cross-synthesis voice morphing timbral metamorphosis timbral morphing whispering/hoarseness de-esser declicking denoising exciter enhancer spectral compressor gender change intonation change martianisation prosody change resampling ring modulation robotisation spectral tremolo spectral warping time shuffling vibrato 4.2. Classification Based on the Type of Control The second discipline-specific classification takes the standpoint of the composer and the performer, and is based on the type of control that audio effects offer or require [2, 8]: constant or variable. If variable, the control can be provided by a wave generator – periodic or low frequency oscillator (LFO): sinusoidal, triangular, exponential/logarithmic – or by arbitrary generators, such as gestural control (realtime user-defined control), automation (offline user-defined control) and adaptive (sound-defined control, using sound descriptors that represent musical gestures). This classification completes previous ones and appeals to performers, composers and developer. It can also help defining a general framework and designing new audio effects [2, 10]. 4.3. Classification Based on Perceptual Attributes Finally, audio effects can be classified according to the various perceptual attributes they modify: • pitch: height and chroma; melody, intonation, contour, harmony, harmonicity; glissando; • dynamics: nuances, phrasing (legato, and pizzicato), accents, tremolo; • time: duration, tempo, rhythm accelerando/deccelerando; • space: localization (distance, azimut, elevation) and sound motion; room effect (reverberation, echo); directivity; • timbre: formants (color); brightness (or spectral height), quality; metamorphosis; texture, harmonicity; vibrato, trill, flatterzunge, legato, pizzicato. This classification was introduced in the context of content-based transformations [3], and adaptive audio effects [2, 10]. For the main audio effects, Table 1 indicates the main and secondary perceptual attributes modified, along with complementary informations for programmers and users about real time implementation and control type. Another way to represent the various links between an effect and perceptual attributes uses a heuristic map [21] as depicted in Fig. 4, where audio effects are linked in the center to the main perceptual attribute modified. Other subattributes are introduced. For the sake of simplicity, audio effects are attached to the center only for the main modified perceptual attributes. When other attributes are slightly modified, they are indicated on the opposite side, ie. at the figure bounds. Perceptual Attr. Main Other L T L T L L L L T D P,L,T D D D D T P P T P P T P P S L,T S P,T S L,P S L S L,D,P,T S S L,D,T S L,T S P,T T L T L T L,P T L T L,P T L,P T L,D,P T L T L T L T L T T P T P T P T P T T L,P T L,P T L,P T L,P T L,P T L,P T L,P T L,P T L,P T L,P T L T L T L T L T L T L L,T P,T L L,P P,T L L,D,P D,T L,P P,T P,T L L,T D T,P L L,D,P,T L,P T,D RT Control A A — LFO A — — — — A A A A A A A LFO LFO random LFO A cross-A cross-A cross-A cross-A cross-A cross-A cross-A — A A A A A — LFO — LFO Table 1: Digital audio effects according to modified perceptual attributes (L: loudness, D: duration and rhythm, P: pitch and harmony, T: timbre and quality, S: space). We also indicate if realtime implementation (RT) is possible (’—’ means ’not possible’), and built-in control type (A: adaptive, cross-A: cross-adaptive, LFO). DAFX-110 Proc. of the 9th Int. Conference on Digital Audio Effects (DAFx-06), Montreal, Canada, September 18-20, 2006 Time amplification Timbre compressor Timbre Timbre Pitch expander gender change noise gate Timbre Timbre Timbre Timbre limiter vocoder effect contrast timbre morphing cross synthesis mutation scaling spec. env. modifications azimuth spec. panning Loudness Pitch harmonics generator Doppler Harmonicity Directivity Directivity Audio Effects Leslie / Rotary Timbre Spectrum shifting pitch-shifting spectral warping detune autotune Time Harmonicity warping Harmonicity harmonizer inharmonizer declicking Pitch Brightness enhancer resampling Pitch distorsion fuzz intonation change Quality vibrato Pitch robotization time-scaling tremolo preservation voice quality Duration attack preservation Pitch Harmonicity denoising vibrato preservation formants preservation spectral ring modulation Formants SSB modulation no formant preservation Loudness Timbre Formants ring modulation Pitch prosody change Pitch Pitch 3D transaural directivity change Time Pitch subharmonics generator Pitch scaling Space 3D binaural Formants Voice quality shifting warping Localization Localization Loudness spectral interpolation height Timbre Pitch hybridization Formants distance panning Pitch spectral envelope warping Loudness nuance change Rhythm tremolo spectral tremolo granular delay echo Room reverberation Loudness timbral metamorphosis hoarseness martianization Pitch Brightness centroid change Pitch comb filter Time resampling swing change Pitch whisperization Rhythm Filter Timbre time-shuffling Timbre inversion resonant filter Pitch telephon effect chorus, flanger, phase, wah-wah Figure 4: Perceptual classification of various audio effects. Bold-italic words are perceptual attributes (pitch, loudness, etc.). Italic words are perceptual sub-attributes (formants, harmonicity, etc.). Other words refer to the corresponding audio effects. When other perceptual attributes are slightly modified by an audio effect, those links are not drawn to the center in order not to overload the heuristic map, but rather to the outer direction. When the user chooses an effect to modify one attribute, the implementation technique used may introduce artifacts, implying modifications of other attributes. We differentiate the perceptual attributes that we primarily want to modify (‘main’ perceptual attributes, and the corresponding dominant modification perceived) and the ‘secondary’ perceptual attributes that are slightly modified (on purpose or as a by-product of the signal processing). A perceptual classification has the advantage of presenting audio effects according to the way they are perceived, taking into account the audible artifacts of the implementation techniques. Of course, none of the proposed classifications is perfect, and depends on the goal we have in mind when using it. However, for sharing and spreading knowledge about audio effects between DSP programmers, composers and listeners, this classification offers a vocabulary dealing with our auditory perception of the sound produced by the audio effect, that we all share since we all are listeners in the communication chain (see section 1). 5. INTERDISCIPLINARY CLASSIFICATIONS This section recalls sound effect classifications and then introduces a interdisciplinary classification linking the different layers of domain-specific descriptors. 5.1. Sound Effects Classifications Sound effects have been thoroughly investigated in electroacoustic music. For instance, Schaeffer classified sounds according to matter (harmonicity and noisiness), form (dynamics) and variation (melodic profile) [22]. In the context of ecological acoustics, Schafer [23] introduced the idea that soundscapes reflect human activities. He proposed four main categories of environmental sounds: mechanical sounds (traffic and machines), human sounds (voices, footsteps), collective sounds (resulting from social activites) and sounds conveying information about the environment DAFX-111 Proc. of the 9th Int. Conference on Digital Audio Effects (DAFx-06), Montreal, Canada, September 18-20, 2006 Warm Sound Semantic Descriptors Several Performers Perceptual Attribute Timbre Control Type White Noise Semantic Descriptors Perceptual Attribute Chorus Effect Applied Processing Transposition Time-Scaling Resampling Processing Domain Frequency Domain Time Domain Time Domain SOLA Delay Line Digital Implementation Phase Vocoder Technique Additive Model Figure 5: Transverse diagram for the chorus effect: it is applied by mixing modified versions of the signal (slight pitch-shifting, timescaling and spectral envelope), either by using the usual whitenoise modulated delay-line, or by using a model of sound (such as phase vocoder, SOLA or additive). (warning signals or spatial effects). Gaver [24] also introduced the distinction between musical listening and everyday listening. Musical listening focuses on perceptual attributes of the sound itself (e.g. pitch, loudness), whereas everyday listening focuses on events to gather relevant information about our envionment (e.g. car approaching), that is, not about the sound itself but rather about sound sources and actions producing sound. Recent research on soundscape perception validated this view by showing that people organize familiar sounds on the basis of source identification. But there is also evidence that the same sound can give rise to different cognitive representations which integrate semantic features (e.g. meaning attributed to the sound) into physical characteristics of the acoustic signal (see [25] for a review). Therefore, semantic features must be taken into consideration when classifying sounds, but they cannot be matched with physical characteristics in a one-to-one relationship. Similary, audio effects give rise to different semantic interpretations depending on how they are implemented or controlled. Semantic descriptors were investigated in the context of distortion [26] and different standpoints on reverberation were summarized in [27]. Our contribution is to propose a interdisciplinary classification of audio effects in an attempt to bridge the gaps between discipline-specific classifications by extending previous research on isolated audio effects. 5.2. Interdisciplinary Audio Effects Classification The proposed interdisciplinary classification links the various layers of discipline-specific classifications presented in section 4, ranging from low-level to high-level features as follows: digital implementation technique, processing domain, applied processing, control type, perceptual attributes, and semantic descriptors. The first example concerns the chorus effect (Fig. 5). The usual implementation involves one or many delay lines, with modulated length and controlled by a white noise. An alternative and more realistic sounding implementation consists in using several slightly pitch-shifted and time-scaled versions of the same sound, and mixing them together. In this case, the resulting audio effect sounds more like a chorus of people or instruments playing the same harmonic and rhythmic patterns together. The second Closed/ Opened Vowels Timbre S W ensi ah tiv ah -W e W to h ah Au Wa Wah-Wah Control Type LFO Applied Processing Resonant FIlter Vocal FIlter Processing Domain Frequency Domain Filter Time Domain Filter Digital Implementation Technique Phase Vocoder Gestural Additive Model Adaptive Delay Line Figure 6: Transverse diagram for the wah-wah effect: the control type defines the effect’s name, i.e. wah-wah, automatic wah-wah (with LFO) or sensitive wah-wah (adaptive control). The wah-wah effect itself can be applied using a delay line modulation technique, or a sound model (such as phase vocoder, additive). example in Fig. 6 illustrates various control types for the wah-wah effect: gestural (with a foot pedal), periodic (with an LFO) and adaptive (controlled by the attack and a release time). We can see there the importance of specifying the control type as part of the effect definition. Another example is the delay line modulation, which results in a chorus when modulated by a white noise, and in a flanger when modulated by an LFO. 6. CONCLUSIONS This articles summarizes various classifications of audio effects elaborated in different disciplinary fields. It then proposes an interdisciplinary classification linking the different layers of domainspecific features that aims to facilitate knowledge exchange between the fields of musical acoustics, signal processing, psychoacoustics and cognition. Not only do we address the classification of audio effects, but we further explain the relationships between structural and control parameters of signal processing algorithms and the perceptual sub-attributes of timbre modified by audio effects. This interdisciplinary classification emerged from our experience using audio effects, and teaching them to engineers and musicians4 . We are convinced that a wider range of audiences, including electronic music composers would benefit from this interdisciplinary approach [28]. Indeed, a generalization of this classification to all audio effects would have a strong impact on pedagogy, knowledge sharing across disciplinary fields and musical practice. For example, it is well known that DSP engineers conceive better tools when they know how it can be used in a musical context. Furthermore, linking perceptual features to signal processing techniques will enable the development of more intuitive user interfaces providing control over high-level perceptual and cognitive attributes rather than low-level signal parameters. An example of a system providing a 4 C. Traube teaches sound synthesis and audio effects to electroacoustic composers at the Université de Montréal, Canada. V. Verfaille teaches digital audio effects to computer and electrical engineers at the ENSEIRB, France. C. Guastavino teaches multimedia systems at McGill University. DAFX-112 Proc. of the 9th Int. Conference on Digital Audio Effects (DAFx-06), Montreal, Canada, September 18-20, 2006 mapping from perceptual attributes to audio effect control parameters is the Spatializer developed at IRCAM, where the user can manipulate perceptual attributes to control audio effects [29]. Similar systems are envisaged to generalize this approach to other audio effects. However, research on semantic descriptors, the higher-level layer in our interdisciplinary classification still remains at a very early stage. Further research is needed to investigate verbal descriptors and identify relationships between semantic features and relevant correlates of lower-level attributes. Future directions also include the development of tools for communicating, visualizing, and navigating within the interdisciplinary classification of audio effects, as shown in the figures. Depending on users’ needs and expertise, information for each effect can be represented at the level of one of the layers corresponding to domain-specific classifications, or using the interdisciplinary classification to navigate between layers. This information could be collaboratively collected using a Wiki, and organized as complex trees and networks, to allow for navigation and exploration. Users could easily locate features that are relevant to them for the task at hand. They could then accordingly identify groups of relevant audio effects, for instance on the basis of perceptual attribute(s) modified, underlying signal processing techniques, or type of control used, etc. This generalized classification system would best meet the needs of a wide range of users with differing strategies, goals and expertise. 7. REFERENCES [1] U. Zölzer, DAFX – Digital Audio Effects. J. Wiley & Sons, 2002. [2] V. Verfaille, “Effets audionumériques adaptatifs : Théorie, mise en œuvre et usage en création musicale numérique,” Ph.D. dissertation, Université Aix-Marseille II, 2003. [3] X. Amatriain, J. Bonada, A. Loscos, J. L. Arcos, and V. Verfaille, “Content-based transformations,” J. New Music Research, vol. 32, no. 1, pp. 95–114, 2003. [4] S. Orfanidis, Introduction to Signal Processing. Prentice Hall Int. Editions, 1996. [5] G. D. Poli, A. Picialli, S. T. Pop, and C. Roads, Musical Signal Processing. Eds. Swets & Zeitlinger, 1996. [6] C. Roads, The Computer Music Tutorial. Cambridge, Massachusetts: MIT Press, 1996. [7] F. R. Moore, Elements of Computer Music. Englewood Cliffs, N.J.: Prentice Hall, 1990. [8] V. Verfaille, M. M. Wanderley, and P. Depalle, “Mapping strategies for gestural control of adaptive digital audio effects,” J. New Music Research, vol. 35, no. 1, pp. 71–93, 2006. [9] A. Camurri, G. D. Poli, and D. Rocchesso, “A taxonomy for sound and music computing,” Computer Music J., vol. 19, no. 2, pp. 4–5, 1995. [10] V. Verfaille, U. Zölzer, and D. Arfib, “Adaptive digital audio effects (A-DAFx): A new class of sound transformations,” IEEE Trans. Audio, Speech and Language Proc., vol. 14, no. 5, pp. 1817–1831, 2006. [11] C. A. Rabassó, “L’improvisation: du langage musical au langage littéraire,” Intemporel: bulletin de la Société Nationale de Musique, vol. 15, 1995, [Online] http://catalogue.ircam. fr/hotes/snm/itpr15rabatxt.html. [12] D. Hargreaves, D. Miell, and R. MacDonald, “What do we mean by musical communication, and why it is important?, intro. of ‘musical communication (part 1)’, session, ICMPC CD-ROM,” in Proc. Int. Conf. on Music Perception and Cognition, 2004. [13] J.-J. Nattiez, Fondements d’une sémiologie de la musique. Paris: U. G. E., Coll. 10/18, 1975. [14] J. Molino, “Fait musical et sémiologie de la musique,” Musique en jeu, vol. 17, pp. 37–62, 1975. [15] E. Moulines and F. Charpentier, “Pitch synchronous waveform processing techniques for text-to-speech synthesis using diphones,” Speech Communication, vol. 9, no. 5/6, pp. 453–67, 1990. [16] J. Laroche, Applications of Digital Signal Processing to Audio & Acoustics. M. Kahrs and K. Brandenburg (Eds.), Kluwer Academic Publishers, 1998, ch. Time and pitch scale modification of audio signals, pp. 279–309. [17] M. Portnoff, “Implementation of the digital phase vocoder using the fast Fourier transform,” IEEE Trans. Acoust., Speech, and Signal Proc., vol. 24, no. 3, pp. 243–8, 1976. [18] D. Arfib, F. Keiler, and U. Zölzer, “Time-frequency processing,” in DAFX – Digital Audio Effects, U. Zölzer, Ed. J. Wiley & Sons, 2002, pp. 237–97. [19] R. J. McAulay and T. F. Quatieri, “Speech analysis/synthesis based on a sinusoidal representation,” IEEE Trans. Acoust., Speech, and Signal Proc., vol. 34, no. 4, pp. 744–754, 1986. [20] V. Verfaille and P. Depalle, “Adaptive effects based on STFT, using a source-filter model,” in Proc. Int. Conf. on Digital Audio Effects (DAFx-04), Naples, Italy, 2004, pp. 296–301. [21] T. Buzan and B. Buzan, Mind Map Book. Plume, 1996. [22] P. Schaeffer, Traité des Objets Musicaux. Seuil, 1966. [23] R. M. Schafer, The Tuning of the World. Knopf: New York, 1977. [24] W. W. Gaver, “What in the world do we hear? an ecological approach to auditory event perception,” Ecological Psychology, vol. 5, no. 1, pp. 1–29, 1993. [25] C. Guastavino, B. F. Katz, J.-D. Polack, D. J. Levitin, and D. Dubois, “Ecological validity of soundscape reproduction,” Acta Acustica united with Acustica, vol. 91, no. 2, pp. 333–341, 2005. [26] A. Marui and W. L. Martens, “Perceptual and semantic scaling for user-centered control over distortion-based guitar effects,” in 110th Conv. Audio Eng. Soc., Amsterdam, the Netherlands, 2001, preprint 5387. [27] B. Blesser, “An interdisciplinary synthesis of reverberation viewpoints,” J. Audio Eng. Soc., vol. 49, no. 10, pp. 867– 903, 2001. [28] L. Landy, “Reviewing the musicology of electroacoustic music: a plea for greater triangulation,” Org. Sound, vol. 4, no. 1, pp. 61–70, 1999. [29] J.-P. Jullien, E. Kahle, M. Marin, O. Warusfel, G. Bloch, and J.-M. Jot, “Spatializer: a perceptual approach,” in 96th Conv. Audio Eng. Soc., Amsterdam, the Netherlands, # 3465, 1993, pp. 1–13. DAFX-113

Log In

An interdisciplinary approach to audio to effect classification

Related papers

Related papers

Related topics