Proc. of the 9th Int. Conference on Digital Audio Effects (DAFx-06), Montreal, Canada, September 18-20, 2006
AN INTERDISCIPLINARY APPROACH TO AUDIO EFFECT CLASSIFICATION
Vincent Verfaille
Catherine Guastavino
SPCL / CIRMMT, MGill University
Montréal, Qc, Canada
[email protected]
GSLIS / CIRMMT, McGill University
Montréal, Qc, Canada
[email protected]
Caroline Traube
LIAM, Université de Montréal
Montréal, Qc, Canada
[email protected]
ABSTRACT
The aim of this paper is to propose an interdisciplinary classification of digital audio effects to facilitate communication and
collaborations between DSP programmers, sound engineers, composers, performers and musicologists. After reviewing classifications reflecting technological, technical and perceptual points of
view, we introduce a transverse classification to link disciplinespecific classifications into a single network containing various
layers of descriptors, ranging from low-level features to high-level
features. Simple tools using the interdisciplinary classification are
introduced to facilitate the navigation between effects, underlying
techniques, perceptual attributes and semantic descriptors. Finally,
concluding remarks on implications for teaching purposes and for
the development of audio effects user interfaces based on perceptual features rather than technical parameters are presented.
The main motivation for an interdisciplinary approach to audio effect classification is to facilitate communication between researchers and creators working on or with audio effects (e.g. DSP
programmers, sound engineers, sound designers, electroacoustic
music composers, performers using augmented or extended acoustic instruments or digital instruments, musicologists). The concerned disciplines are acoustics, electrical engineering, psychoacoustics, music cognition and psycholinguistics.
In the next section, vocabulary is clarified regarding audio effets vs sound effects and sound transformations. In section 3, we
present the various standpoints on digital audio effects through a
description of the communication chain in music. In section 4
we describe three discipline-specific classifications2 : based on underlying techniques, control signals and perceptual attributes. In
section 5, we introduce an interdisciplinary classifications linking
the different layers of domain-specific descriptors in section 5, before concluding with pedagogical and human computer interfaces
remarks in section 6.
1. INTRODUCTION
More than 70 different digital audio effects have previously been
identified 1 ([1] and [2]). Digital audio effects are tools used by
composers, performers and sound engineers, but are generally described from the standpoint of DSP engineers who designed them.
They are therefore documented and classified in terms of the underlying techniques and technologies in both software documentations and textbooks.
However, other classification schemes are commonly used by
different communities. These include perceptual classification [3,
2], signal processing classification [4, 5, 6, 7, 1], control type classification [8], and sound and music computing classification [9].
The comparison of these classifications reveal strong differences. Specifically, each classification was introduced to best meet
the needs of a specific audience, thus relying on features that are
relevant for a given community, but may be meaningless or obscure for a different community. For example, signal processing
techniques are rarely presented according to perceptual features
but rather according to acoustical dimensions. Conversely, composers usually rely on perceptual or cognitive features rather than
acoustical dimensions.
1 This value (70) is highly dependent on the degree of refinement as well
as the experience and goals of the user, and therefore highly subjective!
2. SOME CLARIFICATION ON VOCABULARY
Audio Effect or Sound Effect? The word ‘effect’ denotes an impression produced in the mind of person, a change in perception
resulting from a cause.
Sound effects and audio effects denote two related but different concepts. Sounds effects are sounds that affect us, whereas
audio effects are transformations applied to sounds in order to
modify how they affect us. Sound effect databases provide natural (recorded) and processed sounds (resulting from audio effects) that produce specific effects on perception used to simulate
actions, interaction or emotions in various contexts (e.g. music,
movie soundtracks).
In the field of audio effects, the meaning of ‘effect’ has shifted
from the perception of a change to the processing technique used to
produce this change, reflecting a semantic confusion between what
is perceived (the effect of processing) and the signal processing
technique used to achieve this effect.
Audio Effect or Sound Transformation? Distinctions between
audio effects and sound transformations originated from technical
2 It should be noted that the presented classifications are not classifications in the strict sense of the term since they are neither mutually exclusive
(one effect can be classified in more than one class), nor exhaustive.
DAFX-107
Proc. of the 9th Int. Conference on Digital Audio Effects (DAFx-06), Montreal, Canada, September 18-20, 2006
and/or historical considerations that distinguishes simple vs. complex and surface vs. deep processing.
Audio effects originally denoted simple processing systems
based on simple mathematical operations, e.g. chorus by random control of delay line modulation; echo by a delay line; distorsion by non-linear processing. It was assumed that audio effects process sound at its surface, since sound is represented by
wave form samples (not a high-level sound model) and simply processed by delay lines, filters, gains, etc. Sound transformations,
on the other hand, denoted complex processing systems based on
analysis–synthesis models (high-level sound models such as additive and substractive models). They were considered to offer
deeper modifications, such as high-quality pitch-shifting with formant preservation, timbre morphing, and time-scaling with attack,
pitch and panning preservation.
Over time, practice blurred the boundaries between audio effects and sound transformations. Indeed, sound transformations
can be used to produce the same changes as audio effects but often at a higher processing cost (ie. panning or comb filtering in the
spectral domain). Moreover, some audio effects considered as simple processing actually require complex processing. For instance,
reverberation systems are usually considered as simple audio effects because they were originally developed using simple operations, eventhough they apply complex sound transformations. The
surface/depth distinction is also confusing because it can refer to
either the technique used or to the effect it has on perception. For
example, a distortion effect is a simple surface transformation from
a technical point of view, but has a strong impact on perception
and could therefore be described as a deep transformation from
the perceptual standpoint. Conversely, a time-scaling with pitch,
formants and attack preservation requires complex techniques but
is perceived as a simple and clear transformation by the listener.
Therefore, we consider that the terms ‘audio effects’, ‘sound
transformations’ and ‘musical sound processing’ all refer to the
same process: applying signal processing techniques to sounds in
order to modify how they will be perceived, or in other words, to
transform a sound into another sound with a different quality [10].
The different terms are often used interchangeably. For consistency sake, we will use ‘audio effects’ throughout this paper.
3. MULTIPLE STANDPOINTS ON AUDIO EFFECTS
Despite the variety of needs and standpoints, the technological terminology is predominantly employed by the actual users of audio
effects : composers and performers. This technological classification might be the most rigourous and systematic one but it unfortunately only refers to the techniques used while ignoring our
perception of the resulting audio effects, which seems more relevant in a musical context.
Instrument
Maker
Composer
Score
(aesthetic
limits)
Instrument
(physical limits)
Performer
Sound
Auditor
Figure 1: Communication chain in music. The composer, performer and instrument maker are also listeners, but in a different
context than the auditor.
The communication chain in music [11, 12] essentially
produces musical sounds. This concept has been adapted from
linguistics and semiology to music [13], based on [14], in a
tripartite semiological scheme distinguishing three levels of
musical communication between a composer (producer) and
a listener (receiver) through a physical, neutral trace such as
sounds. In order to investigate all possible standpoints on audio
effects, we apply this scheme to a complete chain, as depicted
on Fig. 1, including all actors intervening in the processes of the
conception, creation and perception of music (instrument makers,
composers, performers and listeners). The poietic level concerns
the conception and creation of a musical message to which
instrument makers, composers and performers participate in
different ways and at different stages. The neutral level is that of
the physical ‘trace’ (instruments, sounds or scores). The aesthesic
level corresponds to the perception and reception of the musical
message by a listener. In the case of audio effects, the instrument
maker is the signal processing engineer who designs the effect
and the performer is the user of the effect (musician, sound
engineer). In the context of mixed music creation, composers,
performers and instrument makers (music technologists) are
usually distinct individuals who need to efficiently communicate
with one another. But all actors in the chain are also listeners who
can share descriptions of what they hear and how they interpret
it. Therefore we will consider the perceptual and cognitive
standpoints as the entrance point to the proposed interdisciplinary
network of the various domain-specific classifications.
We also consider the specific case of electroacoustic music
composers who often combine additional programming and performance skills. They conceive their own processing system, control and perform on their instruments. Although all production
tasks are performed by a single multidisciplinary artist in this case,
a tranverse classification is still helpful to achieve a better awareness of the relations between the different levels of description of
an audio effect (from technical to perceptual standpoints).
4. DISCIPLINE-SPECIFIC CLASSIFICATIONS
4.1. Classification Based on Underlying Techniques
The first classification we present is from the ‘instrument maker’
standpoint (DSP engineer or programmer). It is focused on the
underlying techniques used to implement the audio effects. Many
digitally implemented effects are in fact emulations of their analog
ancestors3 . We distinguish the following analog technologies [2]:
• mechanics/acoustics (e.g. musical instruments, effects due
to room acoustics);
• electromechanics (e.g. vinyls: pitch-shifting by changing
disk rotation speed);
• electromagnetics (e.g. magnetic tapes: flanging);
• electronics (e.g. filters, vocoder, ring modulators).
For example, flanging was originally obtained when pressing the
thumb on the flange of a magnetophone wheel and is now emulated
with digital comb filters with varying delays. Digital ring modulation (referring to the multiplication of two signals) borrows its
name from the analog ring-shaped circuit of diodes originally used
to implement this effect. Other digital effects emulating acoustical
or perceptual properties of electromechanic, electric or electronic
3 Similarly, some analog audio effects implemented with one technique
were emulating audio effects already existing with another analog technique. At some point, analog and/or digital techniques were also creatively
used so as to provide new effects.
DAFX-108
Proc. of the 9th Int. Conference on Digital Audio Effects (DAFx-06), Montreal, Canada, September 18-20, 2006
Time-Shuffle
Chorus
Length Mod.
A-T-Scale + T-Sync.
A-T-Scale, no T-Sync.
Delay
P-Shift, no SE Pres.
P-Shift + SE Pres.
Delay-Line
Flanger
Comb Filter
Phase Unwarp
Warp
Shift
A-Granular Delay
TD-Resamp.
cep./FD-LPC
Flanger
T-Scal.
Equalizer
Filters
Spectrum Non Linear Modif.
Phasing
Scale (bad P-Shift)
Wah-Wah
A-T-Scale + T-Sync.
Cross-Synthesis (FFT)
T-Scale
Harmonized Robot
Robotization
Robot with SE Modif.
A-T-Scale, no T-Sync.
SOLA
P-Shift, no SE Pres.
IFFT
Whisperization
FD
Fx
Warp
Vibrato
Block/Block
TD
Martianization
P-Shift + SE Pres.
PSOLA
Shift
Scale
Harmonizer
SE Modif.
No Phase Unwarp
Cross-Synthesis
LPC
Flatten / Inverse
Resampling + Formants Pres.
Resampling
Contrast
Equalizer by FFT
Tremolo
Amp. Mod.
Amp. Spectrum Modif.
Spectral Panning
Ring Mod. + SE Pres.
Ring Mod.
Distorsion
Time-Shuffle with SE Modif.
Violoning
Noise Gate
Gain
Spectral Tremolo
Spectral Ring Modulation
Osc. Bank
Limiter
Expander
Compressor
Power Panning
Figure 2: A technical classification of audio effects used to design a multi-effects system. ‘TD’ stands for ‘time-domain’, ‘FD’ for ‘frequency
domain’, ‘t-scale’ for ‘time-scaling’, ‘p-shift’ for ‘pitch-shifting’, ‘+’ for ‘with’, ‘A-’ for adaptive control’, ‘SE’ for ‘spectral envelope’,
‘osc’ for ‘oscillator’, ‘mod.’ for ‘modulation’ and ‘modif.’ for ‘modification’. Bold italic font words denote technical aspects, whereas
regular font words denote audio effects.
effects include filtering, the wah-wah effect, the vocoder effect,
reverberation, echo and Leslie effect.
Digital audio effects can be organized on the basis of implementation techniques, as proposed in [1]:
•
•
•
•
•
•
•
•
•
•
filters;
delays (resampling);
modulators and demodulators;
nonlinear processing;
spatial effects;
time-segment processing e.g. SOLA, PSOLA [15, 16];
time-frequency processing e.g. phase vocoder [17, 18];
source-filter processing (e.g. LPC);
spectral processing (e.g. SMS [19]);
time and frequency warping.
Filter
Warp
Scale
Signal
Components
Source
Multiply
Shift
Identity
|.|
Interp
Equalizer
Vocoding
Warp
Scale
Robotization
Shift
Ring-Mod.
Pitch-Shift
Gender Change
As shown in this list, a sub-classification of digital audio effects can be based on the domain of application (time, frequency
or time-frequency) and on whether the processing is performed
sample-by-sample or block-by-block [2]:
• time domain: block processing, e.g. OLA, SOLA, PSOLA;
sample processing: e.g. delay line, gain, non linear processing, resampling/interpolation;
• frequency domain (block processing): frequency domain
synthesis (IFFT) e.g. phase vocoder with or without phase
unwrapping; time domain synthesis (oscillator bank);
• time and frequency domain: e.g. phase vocoder and resampling, phase vocoder and LPC.
The advantage of a classification based on the underlying techniques is that the developer can easily see the technical and implementation similarities of various effects, thus simplifying understanding and implementation of multi-effect systems. However, some audio effects can be classified in more than one class.
Spectral
Panning
Donald Duck
Robotization
Math.
Operators
Inharmonizer
Audio Effects
Figure 3: Technical classification of some source-filter effects
based on signal components (source / filter) and applied mathematical operations (grey-shade background). Dashed bold lines
and italic effect names indicate the use of adaptive control.
For instance, time-scaling can be performed with time-segment
and time-frequency processing. Depending on the user expertise
(DSP programmer, electroacoustic composer), this classification
may not be the easiest to understand. In fact, this type of classification does not explicitly handle perceptual features.
Users may choose between several possible implementations
of an effect depending on the artifacts of each effect. For instance, with time-scaling, resampling does not preserve pitch nor
formants; OLA with circular buffer adds the window modulation
and sounds rougher and filtered; phase vocoder sounds a bit reverberant, the ‘sinusoidal + noise’ additive model sounds good
except for attacks, the ‘sinusoidal + transients + noise’ additive
model preserves attacks, but not the spatial image of multi-channel
sounds, etc. Therefore, in order to choose a technique, the user
DAFX-109
Proc. of the 9th Int. Conference on Digital Audio Effects (DAFx-06), Montreal, Canada, September 18-20, 2006
must be aware of the audible artifact of each technique. The need
to link implementation techniques to perceptual features (see 4.3)
thus becomes clear.
As depicted in Fig. 2, some audio effects (e.g. adaptive timescaling with time-synchronization [10]) can be performed with
techniques from various domains (e.g. SOLA: block-by-block and
time-domain; phase vocoder: block-by-block frequency domain
with IFFT synthesis). This classification can be used to design a
multi-effect system, as well as to provide an overview of technical domains and signal processing techniques involved in effects.
However, it may not be relevant for the listener.
Another classification relies on the mathematical operation applied to the components of the sound representation used to process the sound. For example, with source-filter model based audio
effects, the basic mathematical operations that can be applied to
the filter, the source or both components [20] are scaling, shifting,
warping, multiplying, identity and, interpolation (see Fig. 3).
DAFx name
compressor, limiter
expander, noise gate
gain/amplification
normalisation
tremolo
violoning
inversion
time-scaling w/ formant pres.
time-scaling w/ attack pres.
time-scaling w/ vibrato pres.
rhythm/swing change
pitch-shifting w/ formant pres.
pitch-shifting, no formant pres.
pitch change
pitch discretization (auto-tune)
smart harmony
(in-)harmonizer
distance change
directivity
Doppler
echo
granular delay
panning
reverberation
spectral panning
Rotary/Leslie
filter
arbitrary resolution filter
comb filtering
equalizer
resonant filter
wha-wha
auto-wha
envelope shifting
envelope scaling
envelope warping
spectral centroid change
chorus
flanger
phaser
spectrum shifting
adaptive ring modulation
texture change
distorsion
fuzz
overdrive
mutation
spectral interpolation
vocoding
cross-synthesis
voice morphing
timbral metamorphosis
timbral morphing
whispering/hoarseness
de-esser
declicking
denoising
exciter
enhancer
spectral compressor
gender change
intonation change
martianisation
prosody change
resampling
ring modulation
robotisation
spectral tremolo
spectral warping
time shuffling
vibrato
4.2. Classification Based on the Type of Control
The second discipline-specific classification takes the standpoint
of the composer and the performer, and is based on the type of
control that audio effects offer or require [2, 8]: constant or variable. If variable, the control can be provided by a wave generator –
periodic or low frequency oscillator (LFO): sinusoidal, triangular,
exponential/logarithmic – or by arbitrary generators, such as gestural control (realtime user-defined control), automation (offline
user-defined control) and adaptive (sound-defined control, using
sound descriptors that represent musical gestures). This classification completes previous ones and appeals to performers, composers and developer. It can also help defining a general framework and designing new audio effects [2, 10].
4.3. Classification Based on Perceptual Attributes
Finally, audio effects can be classified according to the various
perceptual attributes they modify:
• pitch: height and chroma; melody, intonation, contour, harmony, harmonicity; glissando;
• dynamics: nuances, phrasing (legato, and pizzicato), accents, tremolo;
• time: duration, tempo, rhythm accelerando/deccelerando;
• space: localization (distance, azimut, elevation) and sound
motion; room effect (reverberation, echo); directivity;
• timbre: formants (color); brightness (or spectral height),
quality; metamorphosis; texture, harmonicity; vibrato, trill,
flatterzunge, legato, pizzicato.
This classification was introduced in the context of content-based
transformations [3], and adaptive audio effects [2, 10].
For the main audio effects, Table 1 indicates the main and secondary perceptual attributes modified, along with complementary
informations for programmers and users about real time implementation and control type. Another way to represent the various
links between an effect and perceptual attributes uses a heuristic
map [21] as depicted in Fig. 4, where audio effects are linked in
the center to the main perceptual attribute modified. Other subattributes are introduced. For the sake of simplicity, audio effects
are attached to the center only for the main modified perceptual
attributes. When other attributes are slightly modified, they are
indicated on the opposite side, ie. at the figure bounds.
Perceptual Attr.
Main
Other
L
T
L
T
L
L
L
L
T
D
P,L,T
D
D
D
D
T
P
P
T
P
P
T
P
P
S
L,T
S
P,T
S
L,P
S
L
S
L,D,P,T
S
S
L,D,T
S
L,T
S
P,T
T
L
T
L
T
L,P
T
L
T
L,P
T
L,P
T
L,D,P
T
L
T
L
T
L
T
L
T
T
P
T
P
T
P
T
P
T
T
L,P
T
L,P
T
L,P
T
L,P
T
L,P
T
L,P
T
L,P
T
L,P
T
L,P
T
L,P
T
L
T
L
T
L
T
L
T
L
T
L
L,T
P,T
L
L,P
P,T
L
L,D,P
D,T
L,P
P,T
P,T
L
L,T
D
T,P
L
L,D,P,T
L,P
T,D
RT
Control
A
A
—
LFO
A
—
—
—
—
A
A
A
A
A
A
A
LFO
LFO
random
LFO
A
cross-A
cross-A
cross-A
cross-A
cross-A
cross-A
cross-A
—
A
A
A
A
A
—
LFO
—
LFO
Table 1: Digital audio effects according to modified perceptual
attributes (L: loudness, D: duration and rhythm, P: pitch and harmony, T: timbre and quality, S: space). We also indicate if realtime
implementation (RT) is possible (’—’ means ’not possible’), and
built-in control type (A: adaptive, cross-A: cross-adaptive, LFO).
DAFX-110
Proc. of the 9th Int. Conference on Digital Audio Effects (DAFx-06), Montreal, Canada, September 18-20, 2006
Time
amplification
Timbre
compressor
Timbre
Timbre
Pitch
expander
gender change
noise gate
Timbre
Timbre
Timbre
Timbre
limiter
vocoder effect
contrast
timbre morphing
cross synthesis
mutation
scaling
spec. env. modifications
azimuth
spec. panning
Loudness
Pitch
harmonics generator
Doppler
Harmonicity
Directivity
Directivity
Audio Effects
Leslie / Rotary
Timbre
Spectrum
shifting
pitch-shifting
spectral warping
detune
autotune
Time
Harmonicity
warping
Harmonicity
harmonizer
inharmonizer
declicking
Pitch
Brightness
enhancer
resampling
Pitch
distorsion
fuzz
intonation change
Quality
vibrato
Pitch
robotization
time-scaling
tremolo preservation
voice quality
Duration
attack preservation
Pitch
Harmonicity
denoising
vibrato preservation
formants preservation
spectral ring modulation
Formants
SSB modulation
no formant preservation
Loudness
Timbre
Formants
ring modulation
Pitch
prosody change
Pitch
Pitch
3D transaural
directivity change
Time
Pitch
subharmonics generator
Pitch
scaling
Space
3D binaural
Formants
Voice quality
shifting
warping
Localization
Localization
Loudness
spectral interpolation
height
Timbre
Pitch
hybridization
Formants
distance
panning
Pitch
spectral envelope warping
Loudness
nuance change
Rhythm
tremolo
spectral tremolo
granular delay
echo
Room
reverberation
Loudness
timbral metamorphosis
hoarseness
martianization
Pitch
Brightness
centroid change
Pitch
comb filter
Time
resampling
swing change
Pitch
whisperization
Rhythm
Filter
Timbre
time-shuffling
Timbre
inversion
resonant filter
Pitch
telephon effect
chorus, flanger, phase, wah-wah
Figure 4: Perceptual classification of various audio effects. Bold-italic words are perceptual attributes (pitch, loudness, etc.). Italic words
are perceptual sub-attributes (formants, harmonicity, etc.). Other words refer to the corresponding audio effects.
When other perceptual attributes are slightly modified by an
audio effect, those links are not drawn to the center in order not to
overload the heuristic map, but rather to the outer direction.
When the user chooses an effect to modify one attribute, the
implementation technique used may introduce artifacts, implying
modifications of other attributes. We differentiate the perceptual
attributes that we primarily want to modify (‘main’ perceptual attributes, and the corresponding dominant modification perceived)
and the ‘secondary’ perceptual attributes that are slightly modified
(on purpose or as a by-product of the signal processing).
A perceptual classification has the advantage of presenting audio effects according to the way they are perceived, taking into
account the audible artifacts of the implementation techniques. Of
course, none of the proposed classifications is perfect, and depends
on the goal we have in mind when using it. However, for sharing and spreading knowledge about audio effects between DSP
programmers, composers and listeners, this classification offers a
vocabulary dealing with our auditory perception of the sound produced by the audio effect, that we all share since we all are listeners
in the communication chain (see section 1).
5. INTERDISCIPLINARY CLASSIFICATIONS
This section recalls sound effect classifications and then introduces
a interdisciplinary classification linking the different layers of domain-specific descriptors.
5.1. Sound Effects Classifications
Sound effects have been thoroughly investigated in electroacoustic music. For instance, Schaeffer classified sounds according to
matter (harmonicity and noisiness), form (dynamics) and variation (melodic profile) [22]. In the context of ecological acoustics, Schafer [23] introduced the idea that soundscapes reflect human activities. He proposed four main categories of environmental
sounds: mechanical sounds (traffic and machines), human sounds
(voices, footsteps), collective sounds (resulting from social activites) and sounds conveying information about the environment
DAFX-111
Proc. of the 9th Int. Conference on Digital Audio Effects (DAFx-06), Montreal, Canada, September 18-20, 2006
Warm
Sound
Semantic
Descriptors
Several
Performers
Perceptual
Attribute
Timbre
Control
Type
White
Noise
Semantic
Descriptors
Perceptual
Attribute
Chorus
Effect
Applied
Processing
Transposition
Time-Scaling
Resampling
Processing
Domain
Frequency
Domain
Time
Domain
Time
Domain
SOLA
Delay
Line
Digital
Implementation Phase
Vocoder
Technique
Additive
Model
Figure 5: Transverse diagram for the chorus effect: it is applied by
mixing modified versions of the signal (slight pitch-shifting, timescaling and spectral envelope), either by using the usual whitenoise modulated delay-line, or by using a model of sound (such as
phase vocoder, SOLA or additive).
(warning signals or spatial effects).
Gaver [24] also introduced the distinction between musical listening and everyday listening. Musical listening focuses on perceptual attributes of the sound itself (e.g. pitch, loudness), whereas
everyday listening focuses on events to gather relevant information
about our envionment (e.g. car approaching), that is, not about the
sound itself but rather about sound sources and actions producing
sound. Recent research on soundscape perception validated this
view by showing that people organize familiar sounds on the basis
of source identification. But there is also evidence that the same
sound can give rise to different cognitive representations which integrate semantic features (e.g. meaning attributed to the sound)
into physical characteristics of the acoustic signal (see [25] for a
review). Therefore, semantic features must be taken into consideration when classifying sounds, but they cannot be matched with
physical characteristics in a one-to-one relationship.
Similary, audio effects give rise to different semantic interpretations depending on how they are implemented or controlled.
Semantic descriptors were investigated in the context of distortion
[26] and different standpoints on reverberation were summarized
in [27]. Our contribution is to propose a interdisciplinary classification of audio effects in an attempt to bridge the gaps between
discipline-specific classifications by extending previous research
on isolated audio effects.
5.2. Interdisciplinary Audio Effects Classification
The proposed interdisciplinary classification links the various
layers of discipline-specific classifications presented in section
4, ranging from low-level to high-level features as follows:
digital implementation technique, processing domain, applied
processing, control type, perceptual attributes, and semantic
descriptors. The first example concerns the chorus effect (Fig. 5).
The usual implementation involves one or many delay lines, with
modulated length and controlled by a white noise. An alternative
and more realistic sounding implementation consists in using
several slightly pitch-shifted and time-scaled versions of the same
sound, and mixing them together. In this case, the resulting audio
effect sounds more like a chorus of people or instruments playing
the same harmonic and rhythmic patterns together. The second
Closed/
Opened
Vowels
Timbre
S
W ensi
ah tiv
ah
-W e
W
to h
ah
Au Wa Wah-Wah
Control
Type
LFO
Applied
Processing
Resonant
FIlter
Vocal
FIlter
Processing
Domain
Frequency
Domain Filter
Time Domain
Filter
Digital
Implementation
Technique
Phase
Vocoder
Gestural
Additive
Model
Adaptive
Delay
Line
Figure 6: Transverse diagram for the wah-wah effect: the control
type defines the effect’s name, i.e. wah-wah, automatic wah-wah
(with LFO) or sensitive wah-wah (adaptive control). The wah-wah
effect itself can be applied using a delay line modulation technique,
or a sound model (such as phase vocoder, additive).
example in Fig. 6 illustrates various control types for the wah-wah
effect: gestural (with a foot pedal), periodic (with an LFO) and
adaptive (controlled by the attack and a release time). We can see
there the importance of specifying the control type as part of the
effect definition. Another example is the delay line modulation,
which results in a chorus when modulated by a white noise, and
in a flanger when modulated by an LFO.
6. CONCLUSIONS
This articles summarizes various classifications of audio effects
elaborated in different disciplinary fields. It then proposes an interdisciplinary classification linking the different layers of domainspecific features that aims to facilitate knowledge exchange between the fields of musical acoustics, signal processing, psychoacoustics and cognition.
Not only do we address the classification of audio effects, but
we further explain the relationships between structural and control parameters of signal processing algorithms and the perceptual
sub-attributes of timbre modified by audio effects. This interdisciplinary classification emerged from our experience using audio effects, and teaching them to engineers and musicians4 . We are convinced that a wider range of audiences, including electronic music composers would benefit from this interdisciplinary approach
[28]. Indeed, a generalization of this classification to all audio effects would have a strong impact on pedagogy, knowledge sharing
across disciplinary fields and musical practice. For example, it is
well known that DSP engineers conceive better tools when they
know how it can be used in a musical context. Furthermore, linking perceptual features to signal processing techniques will enable
the development of more intuitive user interfaces providing control over high-level perceptual and cognitive attributes rather than
low-level signal parameters. An example of a system providing a
4 C. Traube teaches sound synthesis and audio effects to electroacoustic
composers at the Université de Montréal, Canada. V. Verfaille teaches
digital audio effects to computer and electrical engineers at the ENSEIRB,
France. C. Guastavino teaches multimedia systems at McGill University.
DAFX-112
Proc. of the 9th Int. Conference on Digital Audio Effects (DAFx-06), Montreal, Canada, September 18-20, 2006
mapping from perceptual attributes to audio effect control parameters is the Spatializer developed at IRCAM, where the user can
manipulate perceptual attributes to control audio effects [29].
Similar systems are envisaged to generalize this approach to
other audio effects. However, research on semantic descriptors,
the higher-level layer in our interdisciplinary classification still remains at a very early stage. Further research is needed to investigate verbal descriptors and identify relationships between semantic features and relevant correlates of lower-level attributes.
Future directions also include the development of tools for
communicating, visualizing, and navigating within the interdisciplinary classification of audio effects, as shown in the figures. Depending on users’ needs and expertise, information for each effect
can be represented at the level of one of the layers corresponding to
domain-specific classifications, or using the interdisciplinary classification to navigate between layers. This information could be
collaboratively collected using a Wiki, and organized as complex
trees and networks, to allow for navigation and exploration. Users
could easily locate features that are relevant to them for the task
at hand. They could then accordingly identify groups of relevant
audio effects, for instance on the basis of perceptual attribute(s)
modified, underlying signal processing techniques, or type of control used, etc. This generalized classification system would best
meet the needs of a wide range of users with differing strategies,
goals and expertise.
7. REFERENCES
[1] U. Zölzer, DAFX – Digital Audio Effects. J. Wiley & Sons,
2002.
[2] V. Verfaille, “Effets audionumériques adaptatifs : Théorie,
mise en œuvre et usage en création musicale numérique,”
Ph.D. dissertation, Université Aix-Marseille II, 2003.
[3] X. Amatriain, J. Bonada, A. Loscos, J. L. Arcos, and V. Verfaille, “Content-based transformations,” J. New Music Research, vol. 32, no. 1, pp. 95–114, 2003.
[4] S. Orfanidis, Introduction to Signal Processing. Prentice
Hall Int. Editions, 1996.
[5] G. D. Poli, A. Picialli, S. T. Pop, and C. Roads, Musical Signal Processing. Eds. Swets & Zeitlinger, 1996.
[6] C. Roads, The Computer Music Tutorial. Cambridge, Massachusetts: MIT Press, 1996.
[7] F. R. Moore, Elements of Computer Music.
Englewood
Cliffs, N.J.: Prentice Hall, 1990.
[8] V. Verfaille, M. M. Wanderley, and P. Depalle, “Mapping
strategies for gestural control of adaptive digital audio effects,” J. New Music Research, vol. 35, no. 1, pp. 71–93,
2006.
[9] A. Camurri, G. D. Poli, and D. Rocchesso, “A taxonomy for
sound and music computing,” Computer Music J., vol. 19,
no. 2, pp. 4–5, 1995.
[10] V. Verfaille, U. Zölzer, and D. Arfib, “Adaptive digital audio
effects (A-DAFx): A new class of sound transformations,”
IEEE Trans. Audio, Speech and Language Proc., vol. 14,
no. 5, pp. 1817–1831, 2006.
[11] C. A. Rabassó, “L’improvisation: du langage musical au langage littéraire,” Intemporel: bulletin de la Société Nationale
de Musique, vol. 15, 1995, [Online] http://catalogue.ircam.
fr/hotes/snm/itpr15rabatxt.html.
[12] D. Hargreaves, D. Miell, and R. MacDonald, “What do we
mean by musical communication, and why it is important?,
intro. of ‘musical communication (part 1)’, session, ICMPC
CD-ROM,” in Proc. Int. Conf. on Music Perception and Cognition, 2004.
[13] J.-J. Nattiez, Fondements d’une sémiologie de la musique.
Paris: U. G. E., Coll. 10/18, 1975.
[14] J. Molino, “Fait musical et sémiologie de la musique,”
Musique en jeu, vol. 17, pp. 37–62, 1975.
[15] E. Moulines and F. Charpentier, “Pitch synchronous waveform processing techniques for text-to-speech synthesis using diphones,” Speech Communication, vol. 9, no. 5/6, pp.
453–67, 1990.
[16] J. Laroche, Applications of Digital Signal Processing to Audio & Acoustics. M. Kahrs and K. Brandenburg (Eds.),
Kluwer Academic Publishers, 1998, ch. Time and pitch scale
modification of audio signals, pp. 279–309.
[17] M. Portnoff, “Implementation of the digital phase vocoder
using the fast Fourier transform,” IEEE Trans. Acoust.,
Speech, and Signal Proc., vol. 24, no. 3, pp. 243–8, 1976.
[18] D. Arfib, F. Keiler, and U. Zölzer, “Time-frequency processing,” in DAFX – Digital Audio Effects, U. Zölzer, Ed. J.
Wiley & Sons, 2002, pp. 237–97.
[19] R. J. McAulay and T. F. Quatieri, “Speech analysis/synthesis
based on a sinusoidal representation,” IEEE Trans. Acoust.,
Speech, and Signal Proc., vol. 34, no. 4, pp. 744–754, 1986.
[20] V. Verfaille and P. Depalle, “Adaptive effects based on STFT,
using a source-filter model,” in Proc. Int. Conf. on Digital
Audio Effects (DAFx-04), Naples, Italy, 2004, pp. 296–301.
[21] T. Buzan and B. Buzan, Mind Map Book. Plume, 1996.
[22] P. Schaeffer, Traité des Objets Musicaux. Seuil, 1966.
[23] R. M. Schafer, The Tuning of the World. Knopf: New York,
1977.
[24] W. W. Gaver, “What in the world do we hear? an ecological
approach to auditory event perception,” Ecological Psychology, vol. 5, no. 1, pp. 1–29, 1993.
[25] C. Guastavino, B. F. Katz, J.-D. Polack, D. J. Levitin, and
D. Dubois, “Ecological validity of soundscape reproduction,” Acta Acustica united with Acustica, vol. 91, no. 2, pp.
333–341, 2005.
[26] A. Marui and W. L. Martens, “Perceptual and semantic scaling for user-centered control over distortion-based guitar effects,” in 110th Conv. Audio Eng. Soc., Amsterdam, the
Netherlands, 2001, preprint 5387.
[27] B. Blesser, “An interdisciplinary synthesis of reverberation
viewpoints,” J. Audio Eng. Soc., vol. 49, no. 10, pp. 867–
903, 2001.
[28] L. Landy, “Reviewing the musicology of electroacoustic music: a plea for greater triangulation,” Org. Sound, vol. 4,
no. 1, pp. 61–70, 1999.
[29] J.-P. Jullien, E. Kahle, M. Marin, O. Warusfel, G. Bloch, and
J.-M. Jot, “Spatializer: a perceptual approach,” in 96th Conv.
Audio Eng. Soc., Amsterdam, the Netherlands, # 3465, 1993,
pp. 1–13.
DAFX-113