The Semantics of Timbre: Charalampos Saitis and Stefan Weinzierl
The Semantics of Timbre: Charalampos Saitis and Stefan Weinzierl
The Semantics of Timbre: Charalampos Saitis and Stefan Weinzierl
5.1 Introduction
After consultations with his teacher and with the great violinist and collector Efrem
Zimbalist … Yehudi [Menuhin] played on all three [Stradivari violins] and opted for the
“Khevenhüller.” (As a test piece he played “The Prayer” from Handel’s Dettingen Te
Deum.). It was to be his principal instrument for over 20 years. He described it as “ample
and round, varnished in a deep, glowing red, its grand proportions … matched by a sound
at once powerful, mellow and sweet.” Antonio Stradivarius had made the instrument in
1733, his 90th year, when despite his advancing years he was still at the peak of his powers
(Burton 2016, p. 86).
What is a mellow and sweet sound? Imagine yourself listening to a recording of the
famous violinist Yehudi Menuhin (1916–1999) performing on his Khevenhüller
Strad. How would you describe the sound of the violin or the sound of Menuhin?
What about the sound quality of the recording? Musicians, composers, sound art-
ists, listeners, acousticians, musical instrument makers, audio engineers, scholars of
sound and music, even sonar technicians, all share a subtle vocabulary of verbal
attributes when they need to describe timbral qualities of sounds. These verbaliza-
tions are not crucial for processing timbre—listeners can compare (McAdams,
Chap. 2), recognize (Agus, Suied, and Pressnitzer, Chap. 3), or memorize and imag-
ine (Siedenburg and Müllensiefen, Chap. 4) timbral qualities without having to
name them (Wallmark 2014). However, the way we talk about sensory experiences
can disclose significant information about the way we perceive them (Dubois 2000;
Thiering 2015). Menuhin’s mellow and sweet sound is a particular concept, an
abstract yet structured idea anchored to and allowing one to make sense of a particu-
lar perceptual representation (Wallmark 2014). As such, a relation must exist
between the physical properties of a sound that give rise to timbre and its semantic
description.
Results of multidimensional scaling of pairwise sound dissimilarity ratings
(McAdams, Chap. 2) usually show that timbre may be adequately explained on the
basis of just two or three dimensions; a number many times smaller than the pleth-
ora of words and phrases used to communicate timbral impressions. On the one
hand, this might be due to specific perceptual features of individual sounds (referred
to as specificities) that are not mapped onto the shared dimensions of the prevailing
timbre space. For example, the suppression of even harmonics in clarinet tones,
which typically elicits an impression of hollowness, was not accounted for by clas-
sic geometric timbre models alone (e.g., McAdams et al. 1995). On the other hand,
individual verbalizations can be thought of as representing microconcepts—basic
elements of semantic knowledge activated by a stimulus object that are not fully
meaningful on their own but instead yield meaning when assembled into broader
semantic categories (Saitis et al. 2017). Among the diverse timbre vocabulary,
therefore, many seemingly unassociated words may share the same meaning and
refer to the same perceptual dimension.
Accordingly, the main goals of the research ideas and tools discussed in this
chapter are twofold: to identify the few salient semantic substrates of linguistic
descriptions of timbral impressions that can yield consistent and differentiating
responses to different timbres along with their acoustic correlates and to quantify
the relationship between perceptual (similarity-based) and semantic (language-
based) representations for timbre. Important questions include the following:
• How similar are semantic timbre spaces between different categories of sound
objects, for example, between instrument families and between instruments,
voices, and nonmusical sounds?
5 Semantics of Timbre 121
Listening to a sound (speech, music, environmental events, etc.) involves not only
detection-perception of the acoustic signal, but also the interpretation of auditory
information (e.g., pitch or the lack thereof, timbre, duration, dynamics). According
122 C. Saitis and S. Weinzierl
Wake and Asahi (1998) used musical, vocal, and environmental stimuli, and pairs of
naïve listeners to study how they describe different types of sounds. Unlike sound
experts (i.e., musicians, composers, sound artists, recording engineers, sound and
music scholars) the naïve listeners lack a specialized auditory vocabulary. One per-
son in each pair listened to a sound and subsequently described it to their interlocu-
tor, who then had to imagine the described sound and, after listening to the actual
stimulus, assess the similarity between the two. The verbalizations used to convey
the different sounds were mainly of three types. The first type describes the percep-
tion of the sound itself using onomatopoeias (i.e., words or vocables considered by
convention to phonetically mimic or suggest the sound to which they refer; e.g.,
chirin-chirin for the sound of a wind bell) or acoustic terminology (e.g., high
pitched). The second type describes the recognition of the sounding situation using
references to the object that made the sound (e.g., a bird) or the action that produced
it (e.g., twittering) or other contextual information (e.g., in the morning). The third
5 Semantics of Timbre 123
type describes the sound impression using metaphors and similes (e.g., clear, cool).
Wake and Asahi (1998) proposed a model of auditory information processing,
according to which recognition and impression are processed either independently
(perception then recognition or impression) or sequentially (perception then recog-
nition then impression).
In his empirical ethnographic research on the management of talk about sound
between music professionals in the United States, Porcello (2004) identified five
strategies that are common to the discourse of timbre among producers and engi-
neers: (1) spoken/sung vocal imitations of timbral characteristics; (2) lexical ono-
matopoeic metaphors; (3) pure metaphor (i.e., non-onomatopoeic, generally
referencing other sensory modalities or abstract concepts); (4) association (citing
styles of music, musicians, producers, etc.); (5) evaluation (judgements of aesthetic
and emotional value). Thus, a snare drum might sound like /dz:::/ and a muted trom-
bone like wha-wha, a wolf tone on the cello (a persistent beating interaction between
string vibrations and sympathetic body resonances) is usually howling and rough or
harsh, and a violin tone might sound baroque or like Menuhin or beautiful. In com-
parison to the taxonomy of Wake and Asahi (1998), Porcello (2004) distinguishes
between lexical onomatopoeias and vocal mimicry of nonvocal timbres, including
in the latter category nonlexical onomatopoeias, and also considers three types of
sound impression descriptions: pure metaphor, association, and evaluation.
Porcello (2004) further advances a distinction between vocal imitations and ono-
matopoeias on the one hand (which he calls “sonic iconicity”) and the pure iconicity
of metaphors originating in nonauditory sensory experiences or abstract concepts
on the other hand. These, he observes, are usually “codified, especially among
musicians and sound engineers,” (Porcello 2004, p. 747). Following their investiga-
tion of the relation between verbal description and gestural control of piano timbre,
Bernays and Traube (2009, p. 207) similarly concluded that “high level performers
… have developed over the years of practice … an acute perceptive sensibility to
slight sonic variations. This … results in an extensive vocabulary developed to
describe the nuances a performer can detect.” Furthermore, as noted by Traube
(2004), this vocabulary is traditionally communicated from teacher to student in
both the musician and sound engineer communities.
Lemaitre and colleagues (2010) analyzed free sortings of environmental sounds
made by expert and nonexpert listeners along with scores of source-cause identifica-
tion confidence and source-cause verbalizations. For the latter, participants were
asked to provide nonmetaphorical nouns and verbs to describe the object and action
that produced each sound. Participants were also asked to describe what sound
properties they considered in grouping different sounds together. They showed that
naïve listeners categorized environmental sounds primarily on the basis of source-
cause properties. When these could not be identified, nonexpert listeners turned to
the timbral properties of the sound, which they described using metaphors or vocal
imitations. In contrast, musicians and other expert listeners relied more on timbral
characteristics, verbalizing them using metaphors almost exclusively. This finding
may offer support to the auditory information processing model proposed by Wake
and Asahi (1998), who assert that timbral impression is processed independently of
124 C. Saitis and S. Weinzierl
or following source recognition. It could also help to explain why Porcello’s tax-
onomy of timbre verbalizations, which is derived from the discourse of sound
experts, does not include descriptions of the physical cause of a sound, such as those
grouped under “sounding situation” by Wake and Asahi (whose taxonomy is based
on verbalizations by nonexpert listeners).
Wallmark (2018) conducted a corpus linguistic analysis of verbal descriptions of
instrumental timbre across eleven orchestration treatises. The collected verbaliza-
tions were categorized according to: (1) affect (emotion and aesthetics); (2) matter
(physical weight, size, shape); (3) crossmodal correspondence (borrowed from
other senses); (4) mimesis (sonic resemblance); (5) action (physical action, move-
ment); (6) acoustics (auditory terminology); and (7) onomatopoeia (phonetic resem-
blance). This scheme is very similar to the one suggested by Porcello (2004), whose
notion of “pure” metaphor could be seen as encompassing categories (2) to (6).
Whereas onomatopoeic words were prevalent among music producers and engi-
neers in Porcello’s study, they accounted for a mere 2% of Wallmark’s orchestration
corpus, driven primarily by a small number of mostly percussion instruments. In
fact, certain instruments and instrument families were found to have a systematic
effect on verbal description category. For example, the trombone was described
more frequently with affect and mimesis than other brass instruments, while the
violin, viola, and cello all shared similar descriptive profiles (cf., Saitis et al. 2017).
By means of principal components analysis, the seven categories were further
reduced to three latent dimensions of musical timbre conceptualization: material
(loaded positively onto onomatopoeia and matter), sensory (crossmodal and acous-
tics), and activity (action and mimesis).
Notwithstanding the diverse metaphorical timbre lexicon in orchestration books,
taxonomies of musical instruments and the kinds of sounds they produce are usually
based on the nature of the sound-producing material and mechanism. Koechlin
(1954–1959; cited in Chiasson et al. 2017, p. 113–114) proposed instead to organize
instrument sounds for orchestration purposes on the basis of volume and intensity.
Volume is described as an impression of how much space an instrument sound occu-
pies in the auditory scene (“extensity” is used by Chiasson et al. 2017; see also Rich
1916). Based on an inverse relationship between volume and intensity, Koechlin
(cited in Chiasson et al. 2017) further proposed a third attribute of density versus
transparency: a musical sound is dense when it is loud but with a small volume, and
it is transparent when it has a large volume but low intensity. There is evidence that
in the later Middle Ages it was typical to think of musical instruments in terms of
volume of sound (Bowles 1954). In orchestras, and for other musical events, instru-
ments with a big, loud sound (haut in French) would be grouped together against
those with a small, soft sound (bas).
Schaeffer (1966) offered a typo-morphology of “sonorous objects” (i.e., sounds
experienced by attending to their intrinsic acoustic properties and not to their physi-
cal cause) based on sustainment (facture in French) and mass. Sustainment refers to
the overall envelope of the sound and mass is described as “the quality through
which sound installs itself … in the pitch field” (Schaeffer 1966, p. 412), which
appears similar to Koechlin’s notion of volume. Interestingly, Koechlin and
5 Semantics of Timbre 125
Schaeffer were both French, shared a composition background, and published their
typologies within 10 years of each other. Mass extends the concept of pitch in pure
tones (i.e., single frequencies) and tonal sounds (i.e., nonnoisy) to include sounds
with fluctuating or indeterminate pitch (e.g., cymbals, white noise). Each mass has
a particular timbre associated with it—a set of “secondary” qualities that are either
nonexistent (pure tones) or exist at varying degrees from being dissociated (musical
notes) to indistinguishable (white noise) from mass. Given the definition of sono-
rous objects, Schaeffer’s timbre is free from any source-cause associations and is
thus situated clearly in the realm of quality as opposed to identity (Siedenburg,
Saitis, and McAdams, Chap. 1).
In tonal sounds, Schaeffer argues, mass can be low or high (in terms of location
in the pitch field) and thick or thin (in terms of extensity in the pitch field); timbre
can be dark or light (location), ample or narrow (extensity), and rich or poor (in
relation to the intensity of the mass). The latter appears closely related to Koechlin’s
notion of density as they both describe a mass or volume, respectively, in relation to
its intensity. In Smalley’s (1997) Theory of Spectromorphology, which has its ori-
gins in Schaeffer’s ideas, pitch field is replaced by “spectral space”. The latter is
described in terms of emptiness versus plenitude (whether sound occupies the whole
space or smaller regions) and of diffuseness versus concentration (whether sound is
spread throughout the space or concentrated in smaller regions). Like Koechlin and
Schaeffer, Smalley also relies on extra-auditory concepts to serve as discourse for
an organization of auditory material that focuses on intrinsic features of the sound
independently of its source.
Wallmark (2014) argues that the metaphorical description of timbre is not simply a
matter of linguistic convention, and what Porcello singles out as “pure metaphor” is
central to the process of conceptualizing timbre by allowing the listener to commu-
nicate subtle acoustic variations in terms of other more commonly shared sensory
experiences (nonauditory or auditory-onomatopoeic) and abstract concepts. De
Ceuster (2016) points out that timbre has been described with metaphors based on
experiences since the presumed birth of the term in the mid-eighteenth century
(Dolan 2013). Jean-Jacques Rousseau’s “Tymbre” entry in Diderot and D’Alembert’s
Encyclopedié reads:
A sound’s tymbre describes its harshness or softness, its dullness or brightness. Soft
sounds, like those of a flute, ordinarily have little harshness; bright sounds are
often harsh, like those of the vielle [medieval ancestor to the modern violin] or
the oboe. There are even instruments, such as the harpsichord, which are both
dull and harsh at the same time; this is the worst tymbre. The beautiful tymbre is
that which combines softness with brightness of sound; the violin is an example
(cited and translated in Dolan 2013, p. 56).
126 C. Saitis and S. Weinzierl
termed smallness: the lower the first and second formants are, the smaller the vowel.
Schumann (1929), Reuter (1997), and Lembke and McAdams (2015), among
others, have discussed the vowel-like pitch-invariant formant structure of many (but
not all) musical instruments and its role in timbre perception.
In other words, timbre can be experienced with reference to the human and non-
human voice—a conceptualization already evident in Helmholtz’s (1877) choice to
synthesize vowel-like sounds for his Klangfarbe experiments and in Schilling’s
definition of the German term as “denoting mostly the accidental properties of a
voice” (Schilling 1840, p. 647; cited in Kursell 2013). Timbre can also be experi-
enced as a material object that can be seen, touched, and even tasted. Furthermore,
noise-like timbres (e.g., excessive high-frequency content, inharmonicity, flat spec-
trum) can be understood in terms of frictional material interaction. Very similar
metaphorical conceptualizations can be found in verbalizations of other perceptual
aspects of sound, such as pitch and loudness (Eitan and Rothschild 2011; Saitis
et al. 2017). In general, conceptual metaphors of timbre and auditory semantics may
originate in more universal neural processes and structures beyond auditory cogni-
tion (cf., Gallese and Lakoff 2005; Walsh 2013).
synonymous were further discarded. The scales soft–loud and low–high were
included to test the effectiveness of loudness and pitch normalization, respectively.
Factor analysis of ratings by a group of musicians and another group of nonmusi-
cians yielded similar, although not identical, four-factor solutions that explained
more than 80% of the variance in the data. The four factors were defined by the
differentials dull–sharp, compact–scattered, full–empty, and colorful–colorless.
Although participants were instructed to ignore pitch and loudness as much as pos-
sible, ratings on the soft–loud and low–high scales were highly correlated with
those on dull–sharp and dark–bright, respectively. This illustrates how the same
word can have different connotations in different contexts. Even when sounds were
equalized in loudness and pitch, listeners still used related attributes to describe
other impressions. In agreement with the view that verbal attributes of timbre are
“codified” among musically trained listeners (see Sect. 5.2.1), ratings from nonmu-
sicians were more scattered than those of musicians. Prompted by the finding that
the dull–sharp factor explained almost half of the total variance in the data, von
Bismarck (1974b) confirmed in subsequent psychoacoustic experiments that a dull–
sharp scale had desirable measurement properties (e.g., doubling, halving) and con-
cluded that sharpness may represent an attribute of sounds distinguishable from
pitch and loudness.
Von Bismarck’s is arguably the first comprehensive investigation of timbre
semantics, markedly improving upon the earlier studies, but certain aspects have
been questioned. For example, aesthetic-evaluative and affective scales were still
used. In addition, the preliminary assessment of whether or not a scale was suitable
for describing timbre was carried out in an undefined context, without presentation
of the timbres to be described, while further discarding of scales was based on an
arbitrary judgement of word synonymy. Perhaps more importantly, a semantic issue
with the semantic differentials is the assumption of bipolarity that underlies the
model (Heise 1969; Susini et al. 2012). Are soft–loud and dark–bright always true
semantic contrasts? Is sharp the true semantic opposite of dull when talking about
timbre?
One way to address potential biases associated with prescribing antonymic rela-
tionships between adjectives is to use adjective checklists. These were used exten-
sively in musical affect research up until the late 1950s (for a review, see Radocy
and Boyle 2012) but have largely been replaced by semantic scales. Similarly to von
Bismarck (1974a), Pratt and Doak (1976) attempted to first find verbal scales suit-
able for describing timbre. An initial list of 19 “commonly used” adjectives was
reduced to seven items by means of a checklist task. By (arbitrarily) discarding
synonyms and “not very useful” words, the list was further reduced to the attributes
brilliant, rich, and warm; dull, pure, and cold, respectively, were (arbitrarily) chosen
as opposites to form semantic differentials. From ratings of different synthesized
harmonic spectra on the three scales, it was found that the former were most consis-
tently discriminated by the brilliant–dull scale.
In a separate study (Abeles 1979), each of twenty-four recorded isolated clarinet
notes was presented three times, each time with five adjectives randomly selected
from a list of forty words. Three independent groups of clarinetists, nonclarinetist
130 C. Saitis and S. Weinzierl
strong, but not one-to-one, correspondence between semantic and perceptual dimen-
sions of timbre had previously been shown by Samoylenko et al. (1996) and Faure
(2000), who collected free verbalizations during dissimilarity ratings.
the bowed string (via the bow) are used as extra-auditory cues that not only help to
better control the played sound but also contribute to its perceived qualities. For
example, recent research on the evaluation of piano and violin quality has revealed
that an increase in the vibrations felt at the fingertips of pianists and the left hand of
violinists can lead to an increase in perceived sound loudness and richness (Saitis
et al. 2018). Also, impressions like bright and rich mostly refer to the sustained part
of a note, while words like soft tend to describe qualities of transients (cf., Brent
2010; Bell 2015).
An example of constrained verbalization is the repertory grid technique.
Listeners form bipolar constructs (i.e., antonymic pairs of adjectives) by articulating
the difference between two sounds taken from a larger pool that is relevant to the
aims of the task at hand (referred to as elements). Alternatively, three sounds are
presented and listeners are first invited to select the least similar one and subse-
quently to verbally explain their grouping. Finally, listeners are asked to rate all
elements on each new construct. The resulting grid of constructs and elements,
essentially semantic differential ratings, can then be evaluated with factor analyti-
cal, clustering, or multidimensional scaling techniques. Using this method, Grill
(2012) found an expanded semantic space for electroacoustic “textures”, which
combined dimensions pertinent mostly to such sounds (ordered–chaotic or coher-
ent–erratic, homogeneous–heterogeneous or uniform–differentiated) with dimen-
sions commonly found for voices and instruments (high–low or bright–dull,
smooth–coarse or soft–raspy, tonal–noisy).
A semantic space can also be derived quantitatively through MDS of pairwise
distances in a list of adjectives. Moravec and Štěpánek (2003) initially asked con-
ductors, composers, engineers, teachers, and musicians (three groups of bowed-
string, wind, and keyboard performers) to provide words they typically use to
describe the timbre of any musical instrument. The four most frequently mentioned
words across all respondents (sharp, gloomy, soft, clear) were also among the four
most frequently used in each of the three musician groups. Still, some within-group
preferences were observed. Bowed-string players used sweet and warm more fre-
quently than both keyboard and wind performers. Similarly, narrow was much more
popular with wind musicians. The thirty most frequently reported adjectives were
subjected to dissimilarity ratings (Moravec and Štěpánek 2005) and MDS identified
three dimensions closely matching luminance, texture, and mass (Zacharakis et al.
2014), namely, gloomy/dark–clear/bright, harsh/rough–delicate, and full/wide–nar-
row, respectively.
Edwards (1978) collected a corpus of free verbalizations of trombone sound
quality through interviews and a postal survey of over 300 trombone performers. A
subset of the verbal data was arranged in terms of semantic similarity by the author
himself on the basis of proximities identified in the corpus. This kind of dissimilar-
ity matrix was subsequently subjected to MDS. With respect to timbre, two dimen-
sions of small–wide and dull/round–clear/square emerged. A different subset of the
verbalizations indicated a third timbral aspect referring to “amount” and “carrying”
or “penetrating” properties of sound. These seem to generally agree with the find-
ings of Abeles (1979), Kendall and Carterette (1993b), and Nykänen et al. (2009).
134 C. Saitis and S. Weinzierl
In another trombone study, Pratt and Bowsher (1978) selected the scales compact–
scattered, dull–bright, and not penetrating–penetrating to correspond to Edwards’
three dimensions. It was found that the second and third scales were good discrimi-
nators of trombone timbres but compact–scattered was not. Indeed, the latter may
not indicate size, which is the label Edwards gave to his first dimension, but may
indicate density (see Sect. 5.2.1).
Fritz et al. (2012) had violinists arrange sixty-one adjectives for violin timbre on
a two-dimensional grid (excel), so that words with similar meanings lay close
together and those with different meanings lay far apart. The collected grids were
converted into dissimilarity matrices using a custom distance metric between two
cells (see p. 793 in Fritz et al. 2012) and MDS yielded three dimensions: warm/rich/
mellow versus metallic/cold/harsh (richness; texture), bright/responsive/lively ver-
sus muted/dull/dead (resonance; projection), and even/soft/light versus brash/rough/
raspy (texture; clarity). The parenthetical terms potentially correspond to semantic
categories from the cognitive model proposed by Saitis et al. (2017). In both studies,
violinists used words like lively, responsive, ringing, and even bright to describe the
“amount of sound” perceived “under the ear” (resonance) and in relation to spatial
attributes (projection). Differences between the labels of the found semantic dimen-
sions for trombone (wind) and violin (bowed string) timbre seem to generally agree
with those observed by Moravec and Štěpánek (2003).
In the piano study of Bernays and Traube (2011), fourteen adjectives extracted
from spontaneous verbalizations yielded a four-dimensional MDS space. Based on
the first two dimensions (78% of the total variance explained) and additional hierar-
chical clustering, five adjectives were proposed to best represent a semantic space
for piano timbre: bright, dry, dark, round, and velvety. Lavoie (2013) performed
MDS on dissimilarities between adjectives describing classical guitar timbre. In
agreement with Traube (2004), a dimension of velvety/dark–bright/dry was
obtained, related to whether the string is plucked between the sound hole and the
fingerboard versus closer to the bridge (like nasal); a dimension of round/bright–
dull/thin was associated with sound resonance and projection. It is worth noting the
highly similar labels of the reported semantic spaces across the two instruments. To
a certain extent, this may reflect shared conceptualization structures between musi-
cians whose primary instrument produces impulsive string sounds. On the other
hand, given that all three studies were conducted with musicians from the Montreal
region, it may be that these results mirror a verbal tradition specific to that geo-
graphic location, possibly due to a strong influence by one or more particular teach-
ers in the area (cf., Saitis et al. 2017).
orchestral instruments) are rated on verbal scales, but similarities are also evident
when verbal descriptions are collected in the absence of sound examples (e.g., ver-
balization tasks, adjective dissimilarity ratings). The most salient dimensions can be
interpreted broadly in terms of brightness/sharpness (or luminance), roughness/
harshness (or texture), and fullness/richness (or mass). The boundaries between
these dimensions are sometimes blurred, while different types of timbres or sce-
narios of timbre perception evoke semantic dimensions that are specific to each case
(e.g., nasality, resonance/projection, tonalness–noisiness, compact–scattered).
Generally, no striking differences between expert and naïve listeners are observed in
terms of semantic dimensions, although the former tend to be more consistent in
their perceptions than the latter. In this section, the identified semantic dimensions
of timbre are examined further through looking at their acoustic correlates (Sect.
5.4.1) and comparisons between different languages and cultures (Sect. 5.4.2).
with the origin of the latter in psychoacoustic experiments with wideband noise
spectra. However, Almeida et al. (2017) showed that the sharpness model insuffi-
ciently predicted brightness scaling data for tonal sounds. Marozeau and de
Cheveigné (2007) proposed a spectral centroid formula based on the same concept
of weighted partial loudness in critical bands, which better modeled the brightness
dimension of dissimilarity ratings and was less sensitive to pitch variation compared
to the classic spectral centroid descriptor.
Yet another verbal attribute that has been associated with spectral energy distri-
bution is nasality. Etymologically, nasality describes the kind of vocal sound that
results from coupling the oral and nasal cavities (Sects. 5.2.2 and 5.3.2). However,
it is sometimes used to describe the reinforcement of energy in higher frequencies
at the expense of lower partials (Garnier et al. 2007; Mores 2011). In violin acous-
tics, nasality is generally associated with a strong frequency response in the vicinity
of 1.5 kHz (Fritz et al. 2012). Kendall and Carterette (1993b) found that nasal ver-
sus rich wind instrument sounds had more energy versus less energy, respectively,
in the upper harmonics, with rich timbres combining a low spectral centroid with
increased variations of the spectrum over time. Sounds with a high versus a low
spectral centroid and spectral variation were perceived as reedy versus brilliant,
respectively. Adding a violin note in a set of wind instrument timbres confirmed a
strong link between nasality and the spectral centroid, but rich and brilliant were
correlated only with spectral variation and only to some modest degree (Kendall
et al. 1999). Helmholtz (1877) had originally associated the nasality percept specifi-
cally with increased energy in odd numbered upper harmonics, but this hypothesis
remains unexplored.
Are timbral brightness and sharpness the same percept? Both of them relate to
spectral distribution of energy, and most of the related studies seem to suggest at
least partial similarities, but there is still no definite answer to this question. Štěpánek
(2006) suggested that a sharp timbre is one that is both bright and rough. However,
semantic studies of percussive timbre reveal two independent dimensions of bright-
ness and sharpness/hardness (Brent 2010; Bell 2015). Brighter percussive timbres
appear associated with higher spectral centroid values during attack time, while
sharp/hard relates to attack time itself (i.e., sharper/harder percussive sounds feature
shorter attacks). Attack time refers to the time needed by spectral components to
stabilize into nearly periodic oscillations, and it is known to perceptually distinguish
impulsive from sustained sounds (McAdams, Chap. 2). Furthermore, concerning
brightness, there seems to exist a certain amount of interdependency with fullness.
Sounds that are described as thick, dense, or rich are also described as deep or less
bright and brilliant, while nasality combines high-frequency energy with low spec-
tral spread and variability. The acoustic analyses of Marozeau and de Cheveigné
(2007) and Zacharakis et al. (2015) suggest that brightness may not only relate to
spectral energy distribution but also to spectral detail.
To further complicate things, a number of studies based on verbalizations that
were collected either directly from musicians or through books and magazines of
music revealed a semantic dimension of timbre associated with a resonant and ring-
ing but also bright and brilliant sound that can project (Sect. 5.3.2). This suggests an
5 Semantics of Timbre 137
to the first correlation, Bensa et al. (2005) observed that synthetic piano sounds with
the least high-frequency inharmonic partials were perceived as poor, whereas
increasing their number resulted in richer timbres. The second correlation appears
to be in agreement with the connection between richness and high spectral variation
reported for wind instruments by Kendall and Carterette (1993b) and for sustained
instruments by Elliott et al. (2013) and may relate, at least partially, to multiple-
source sounds with higher spectral flux values below 200 Hz that are perceived as
fuller (Alluri and Toiviainen 2010).
The correlation between thickness/density and fundamental frequency found by
Zacharakis et al. (2014) emerged largely due to the presentation of stimuli with dif-
ferent pitches. This acoustic interpretation of thickness/density alludes to an attri-
bute of pure tones described by Stumpf (1890) as volume (Tongröße in German),
which aligns inversely with pitch in that lower/higher pitches are larger/smaller.
Together, the three attributes of volume, pitch, and loudness determine what Stumpf
termed tone color (Tonfarbe). Rich (1916) provided empirical evidence that volume
(he used the word extensity) can be distinct from pitch in pure tones. Terrace and
Stevens (1962) showed that volume can also be perceived in more complex tonal
stimuli, specifically, quarter-octave bands of pitched noise, and that it increases with
loudness but decreases with pitch. Stevens (1934) observed that pure and complex
tones further possess an attribute of density, which changes with loudness and pitch
in a manner similar to perceptions of brightness: the brighter the tone, the louder
and the less dense it is (Boring and Stevens 1936; cf., Zacharakis et al. 2014).
Empirical observations of volume and density perceptions for pure tones have cast
doubt on Schaeffer’s (1966) claim that these have no timbre (Sect. 5.2.1).
Further experiments by Stevens et al. (1965) provided empirical support to
Koechlin’s claim that density is proportional to loudness and inversely proportional
to volume (Sect. 5.2.1). An inverse relation between spectral centroid and volume
was observed, which has been confirmed by Chiasson et al. (2017). They found that
high energy concentrated in low frequencies tends to increase perceived volume,
whereas low energy more spread out in higher frequencies tends to decrease it.
Similarly, Saitis et al. (2015) showed that violin notes characterized as rich tended
to have a low spectral centroid or stronger second, third, and fourth harmonics, or a
predominant fundamental. Given that in harmonic sounds the fundamental is the
lowest frequency, these findings generally agree with Helmholtz’s (1877) claim that
the stronger versus weaker the fundamental is relative to the upper partials, the
richer versus poorer the sound is perceived.
In the interlanguage study of Zacharakis et al. (2014, 2015), the overall configura-
tional and dimensional similarity between semantic and perceptual spaces in both
the English and Greek groups illustrates that the way timbre is conceptualized and
communicated can indeed capture some aspects of the perceptual structure within
140 C. Saitis and S. Weinzierl
a set of timbres, and that native language has very little effect on the perceptual
and semantic processing involved, at least for the two languages tested. There also
seems to be some agreement regarding the number and labeling of dimensions
with studies in German (von Bismarck 1974a; Štěpánek 2006), Czech (Moravec
and Štěpánek 2005; Štěpánek 2006), Swedish (Nykänen et al. 2009), and French
(Faure 2000; Lavoie 2013). Chiasson et al. (2017) found no effect of native lan-
guage (French versus English) on perceptions of timbral volume. All these studies
were conducted with groups of Western listeners and with sounds from Western
musical instruments. Further evidence of whether language (but also culture) influ-
ences timbre semantics comes from research involving non-Western listeners and
non-Western timbres.
Giragama et al. (2003) asked native speakers of English, Japanese, Bengali
(Bangladesh), and Sinhala (Sri Lanka) to provide dissimilarity and semantic ratings
of six electroacoustic sounds (one processed guitar, six effects). Multidimensional
analyses yielded a two-dimensional MDS space shared across the four groups and
two semantic factors (sharp/clear and diffuse/weak) whose order and scores varied
moderately between languages and related differently to the MDS space. For
Bengali and Sinhala, both Indo-Aryan languages, the similarity between the respec-
tive semantic spaces was much stronger, and they correlated better with the MDS
space than for any other language pair, including between the Indo-European
English and Indo-Aryan relatives. Furthermore, the sharp/clear and diffuse/weak
factors closely matched the semantic space of electroacoustic textures found by
Grill (2012), whose study was conducted with native German speakers.
Alluri and Toiviainen (2010) found a three-dimensional semantic timbre space of
activity (strong–weak, soft–hard), brightness (dark–bright, colorless–colorful), and
fullness (empty–full) for Indian pop music excerpts rated by Western listeners who
had low familiarity with the genre. Here timbre refers to timbral mixtures arising
from multiple-source sounds. Both the number and nature of these dimensions are
in good agreement with Zacharakis et al. (2014). Furthermore, similar semantic
spaces were obtained across two groups of Indian and Western listeners and two sets
of Indian and Western pop music excerpts (Alluri and Toiviainen 2012). Acoustic
analyses also gave comparable results between the two cultural groups and between
the two studies. Intrinsic dimensionality estimation revealed a higher number of
semantic dimensions for music from one’s own culture compared to a culture that
one is less familiar with, suggesting an effect of enculturation. Furthermore,
Iwamiya and Zhan (1997) found common dimensions of sharpness (sharp–dull,
bright–dark, distinct–vague, soft–hard), cleanness (clear–muddy, fine–rough), and
spaciousness (rich–poor, extended–narrow) for music excerpts rated separately by
Japanese and Chinese native speakers (type of music used was not reported). These
dimensions appear to modestly match those found by Alluri and Toiviainen (2010)
and by Zacharakis et al. (2014).
Taken as a whole, these (limited) results suggest that conceptualization and com-
munication of timbral nuances is largely language independent, but some culture-
driven linguistic divergence can occur. As an example, Zacharakis et al. (2014)
found that, whereas sharp loaded highest on the luminance factor in English, its
5 Semantics of Timbre 141
Greek equivalent οξύς (oxýs) loaded higher on the texture dimension of the respec-
tive semantic space. Greek listeners also associated παχύς (pakhús), the Greek
equivalent of thick, with luminance rather than mass. Furthermore, a well-known
discrepancy exists between German and English concerning the words Schärfe and
sharpness, respectively (see Kendall and Carterette 1993a, p. 456). Whereas Schärfe
refers to timbre, its English counterpart pertains to pitch. On the one hand, such dif-
ferences between languages may not imply different mental (nonlinguistic) repre-
sentations of timbre but rather reflect the complex nature of meaning.
On the other hand, there exists evidence that language and culture can play a
causal role in shaping nonlinguistic representations of sensory percepts, for exam-
ple, auditory pitch (Dolscheid et al. 2013). This raises a crucial question concerning
the use of verbal attributes by timbre experts such as instrument musicians: To what
extent does experience with language influence mental representations of timbre?
Based on their findings, Zacharakis et al. (2015) hypothesized that “there may exist
a substantial latent influence of timbre semantics on pairwise dissimilarity judge-
ments” (p. 408). This seems to be supported from comparisons between general
dissimilarity, brightness dissimilarity, and brightness scaling data by Saitis and
Siedenburg (in preparation), but more research is needed to better understand the
relationship between linguistic and nonlinguistic representations of timbre.
Nevertheless, semantic attributes, such as brightness, roughness, and fullness, appear
generally unable to capture the salient perceptual dimension of timbre responsible
for discriminating between sustained and impulsive sounds (Zacharakis et al. 2015).
5.6 Summary
Timbre is one of the most fundamental aspects of acoustic communication and yet
it remains one of the most poorly understood. Despite being an intuitive concept,
timbre covers a very complex set of auditory attributes that are not accounted for by
5 Semantics of Timbre 143
Acknowledgements Charalampos Saitis wishes to thank the Alexander von Humboldt Foundation
for support through a Humboldt Research Fellowship.
Compliance with Ethics Requirements Charalampos Saitis declares that he has no conflict of
interest.
Stefan Weinzierl declares that he has no conflict of interest.
References
Abeles H (1979) Verbal timbre descriptors of isolated clarinet tones. Bull Council Res Music Educ
59:1–7
Albersheim G (1939) Zur Psychologie der Toneigenschaften (On the psychology of sound proper-
ties). Heltz, Strassburg
Alluri V, Toiviainen P (2010) Exploring perceptual and acoustical correlates of polyphonic timbre.
Music Percept 27:223–242
Alluri V, Toiviainen P (2012) Effect of enculturation on the semantic and acoustic correlates of
polyphonic timbre. Music Percept 29:297–310
Almeida A, Schubert E, Smith J, Wolfe J (2017) Brightness scaling of periodic tones. Atten Percept
Psychophys 79(7):1892–1896
Bell R (2015) PAL: the percussive audio lexicon. An approach to describing the features of
percussion instruments and the sounds they produce. Dissertation, Swinburne University of
Technology
Bensa J, Dubois D, Kronland-Martinet R, Ystad S (2005) Perceptive and cognitive evaluation of a
piano synthesis model. In: Wiil UK (ed) Computer music modelling and retrieval. 2nd interna-
tional symposium, Esbjerg, May 2004. Springer, Heidelberg, pp 232–245
Bernays M, Traube C (2009) Expression of piano timbre: verbal description and gestural control.
In: Castellengo M, Genevois H (eds) La musique et ses instruments (Music and its instru-
ments). Delatour, Paris, pp 205–222
5 Semantics of Timbre 145
Bernays M, Traube C (2011) Verbal expression of piano timbre: multidimensional semantic space
of adjectival descriptors. In: Williamon A, Edwards D, Bartel L (eds) Proceedings of the inter-
national symposium on performance science 2011. European Association of Conservatoires,
Utrecht, pp 299–304
Bloothooft G, Plomp R (1988) The timbre of sung vowels. J Acoust Soc Am 84:847–860
Boring EG, Stevens SS (1936) The nature of tonal brightness. Proc Natl Acad Sci 22:514–521
Bowles EA (1954) Haut and bas: the grouping of musical instruments in the middle ages. Music
Discip 8:115–140
Bowling DL, Purves D, Gill KZ (2018) Vocal similarity predicts the relative attraction of musical
chords. Proc Natl Acad Sci 115:216–221
Bregman AS (1990) Auditory scene analysis. The perceptual organization of sound. MIT Press,
Cambridge
Brent W (2010) Physical and perceptual aspects of percussive timbre. Dissertation, University of
California
Burton H (2016) Menuhin: a life, revised edn. Faber & Faber, London
Chiasson F, Traube C, Lagarrigue C, McAdams S (2017) Koechlin’s volume: perception of sound
extensity among instrument timbres from different families. Music Sci 21:113–131
Cousineau M, Carcagno S, Demany L, Pressnitzer D (2014) What is a melody? On the relationship
between pitch and brightness of timbre. Front Syst Neurosci 7:127
Daniel P, Weber R (1997) Psychoacoustical roughness: implementation of an optimized model.
Acta Acust united Ac 83:113–123
Douglas C, Noble J, McAdams S (2017) Auditory scene analysis and the perception of sound mass
in Ligeti’s continuum. Music Percept 33:287–305
de Ceuster D (2016) The phenomenological space of timbre. Dissertation, Utrecht University
Disley AC, Howard DM, Hunt AD (2006) Timbral description of musical instruments. In: Baroni
M, Addessi AR, Caterina R, Costa M (eds) Proceedings of the 9th international conference on
music perception and cognition, Bologna, 2006
Dolan EI (2013) The orchestral revolution: Haydn and the technologies of timbre. Cambridge
University Press, Cambridge
Dolscheid S, Shayan S, Majid A, Casasanto D (2013) The thickness of musical pitch: psychophysi-
cal evidence for linguistic relativity. Psychol Sci 24:613–621
Dubois D (2000) Categories as acts of meaning: the case of categories in olfaction and audition.
Cogn Sci Q 1:35–68
Edwards RM (1978) The perception of trombones. J Sound Vib 58:407–424
Eitan Z, Rothschild I (2011) How music touches: musical parameters and listeners’ audio-tactile
metaphorical mappings. Psychol Music 39:449–467
Elliott TM, Hamilton LS, Theunissen FE (2013) Acoustic structure of the five perceptual dimen-
sions of timbre in orchestral instrument tones. J Acoust Soc Am 133:389–404
Fastl H, Zwicker E (2007) Psychoacoustics: facts and models, 3rd edn. Springer, Heidelberg
Faure A (2000) Des sons aux mots, comment parle-t-on du timbre musical? (From sounds to words,
how do we speak of musical timbre?). Dissertation, Ecoles des hautes etudes en sciences sociales
Fritz C, Blackwell AF, Cross I et al (2012) Exploring violin sound quality: investigating English
timbre descriptors and correlating resynthesized acoustical modifications with perceptual prop-
erties. J Acoust Soc Am 131:783–794
Gallese V, Lakoff G (2005) The brain’s concepts: the role of the sensory-motor system in concep-
tual knowledge. Cogn Neuropsychol 22:455–479
Garnier M, Henrich N, Castellengo M et al (2007) Characterisation of voice quality in Western
lyrical singing: from teachers’ judgements to acoustic descriptions. J Interdiscipl Music Stud
1:62–91
Giragama CNW, Martens WL, Herath S et al (2003) Relating multilingual semantic scales to a
common timbre space – part II. Paper presented at the 115th audio engineering society conven-
tion, New York, 10–13 October 2003
146 C. Saitis and S. Weinzierl
Moravec O, Štěpánek J (2003) Verbal description of musical sound timbre in Czech language.
In: Bresin R (ed) Proceedings of the Stockholm Music Acoustics Conference 2003. KTH,
Stockholm, p 643–646
Moravec O, Štěpánek J (2005) Relations among verbal attributes describing musical sound timbre
in Czech language. In: Proceedings of Forum Acusticum Budapest 2005: the 4th European
congress on acoustics. Hirzel, Stuttgart, p 1601–1606
Mores R (2011) Nasality in musical sounds – a few intermediate results. In: Schneider A, von
Ruschkowski A (eds) Systematic musicology: empirical and theoretical studies. Peter Lang,
Frankfurt am Main, pp 127–136
Nykänen A, Johansson Ö, Lundberg J, Berg J (2009) Modelling perceptual dimensions of saxo-
phone sounds. Acta Acust United Ac 95:539–549
Osgood CE (1952) The nature and measurement of meaning. Psychol Bull 49:197–237
Porcello T (2004) Speaking of sound: language and the professionalization of sound-recording
engineers. Soc Studies Sci 34:733–758
Pratt RL, Bowsher JM (1978) The subjective assessment of trombone quality. J Sound Vib
57:425–435
Pratt RL, Doak PE (1976) A subjective rating scale for timbre. J Sound Vib 45:317–328
Pressnitzer D, McAdams S (1999) Two phase effects in roughness perception. J Acoust Soc Am
105:2773–2782
Radocy RE, Boyle JD (eds) (2012) Psychological foundations of musical behavior, 5th edn.
Thomas Books, Springfield
Reuter C (1997) Karl Erich Schumann’s principles of timbre as a helpful tool in stream segregation
research. In: Leman M (ed) Music, gestalt, and computing. Studies in cognitive and systematic
musicology. Springer, Heidelberg, pp 362–372
Reybrouck M (2013) From sound to music: an evolutionary approach to musical semantics.
Biosemiotics 6:585–606
Rich GJ (1916) A preliminary study of tonal volume. J Exp Psychol 1:13–22
Rioux V, Västfjäll D (2001) Analyses of verbal descriptions of the sound quality of a flue organ
pipe. Music Sci 5:55–82
Rozé J, Aramaki M, Kronland-Martinet R, Ystad S (2017) Exploring the perceived harshness of
cello sounds by morphing and synthesis techniques. J Acoust Soc Am 141:2121–2136
Saitis C, Fritz C, Guastavino C, Giordano BL, Scavone GP (2012) Investigating consistency
in verbal descriptions of violin preference by experienced players. In: Cambouropoulos E,
Tsougras C, Mavromatis P, Pastiadis K (eds) Proceedings of the 12th international conference
on music perception and cognition and 8th triennial conference of the European Society for the
Cognitive Sciences of Music, Thessaloniki
Saitis C, Fritz C, Guastavino C, Scavone GP (2013) Conceptualization of violin quality by experi-
enced performers. In: Bresin R, Askenfelt A (eds) Proceedings of the Stockholm music acous-
tics conference 2013. Logos, Berlin, p 123–128
Saitis C, Fritz C, Scavone GP et al (2017) Perceptual evaluation of violins: a psycholinguis-
tic analysis of preference verbal descriptions by experienced musicians. J Acoust Soc Am
141:2746–2757
Saitis C, Järveläinen H, Fritz C (2018) The role of haptic cues in musical instrument quality per-
ception. In: Papetti S, Saitis C (eds) Musical Haptics. Springer, Cham, pp 73–93
Saitis C, Scavone GP, Fritz C, Giordano BL (2015) Effect of task constraints on the perceptual
evaluation of violins. Acta Acust United Ac 101:382–393
Samoylenko E, McAdams S, Nosulenko V (1996) Systematic analysis of verbalizations produced
in comparing musical timbres. Int J Psychol 31:255–278
Schaeffer P (1966) Traité des objets musicaux: essai interdisciplines. Editions du Seuil, Paris.
English edition: Schaeffer P (2017) Treatise on musical objects: an essay across disciplines
(trans: North C, Dack J). University of California Press, Oakland
Schneider A (1997) “Verschmelzung”, tonal fusion, and consonance: carl stumpf revisited. In:
Leman M (ed) Music, gestalt, and computing. Springer, Berlin, pp 117–143
148 C. Saitis and S. Weinzierl
Schumann KE (1929) Physik der Klangfarben (Physics of timbres). Habilitation, Universität Berlin
Serra X (1997) Musical sound modelling with sinusoids plus noise. In: Roads C, Pope S, Piccialli
A, de Poli G (eds) Musical signal processing. Swets Zeitlinger, Lisse, pp 91–122
Simner J, Cuskley C, Kirby S (2010) What sound does that taste? Cross-modal mappings across
gustation and audition. Perception 39:553–569
Slawson W (1985) Sound color. University of California Press, Berkeley
Smalley D (1997) Spectromorphology: explaining sound-shapes. Organised Sound 2:107–126
Solomon LN (1958) Semantic approach to the perception of complex sounds. J Acoust Soc Am
30:421–425
Štěpánek J (2006) Musical sound timbre: verbal description and dimensions. In: Proceedings of the
9th international conference on digital audio effects. McGill University, Montreal, p 121–126
Štěpánek J, Otcěnásek Z (1999) Rustle as an attribute of timbre of stationary violin tones. Catgut
Acoust Soc J (Series II) 3:32–38
Stevens SS (1934) Tonal density. J Exp Psychol 17:585–592
Stevens SS, Guirao M, Slawson AW (1965) Loudness, a product of volume times density. J Exp
Psychol 69:503–510
Stumpf C (1890) Tonpsychologie (Psychology of sound), vol 2. Hirzel, Leipzig
Stumpf C (1898) Konsonanz und Dissonanz (Consonance and dissonance). Barth, Leipzig
Sundberg J (2013) Perception of singing. In: Deutsch D (ed) The psychology of music, 3rd edn.
Academic, London, pp 69–105
Susini P, Lemaitre G, McAdams S (2012) Psychological measurement for sound description and
evaluation. In: Berglund B, Rossi GB, Townsend JT, Pendrill LR (eds) Measurement with
persons: theory, methods, and implementation areas. Psychology Press, New York, pp 227–253
Terrace HS, Stevens SS (1962) The quantification of tonal volume. Am J Psychol 75:596–604
Traube C (2004) An interdisciplinary study of the timbre of the classical guitar. Dissertation,
McGill University
Thiering M (2015) Spatial semiotics and spatial mental models. Figure-ground asymmetries in
language. De Gruyter Mouton, Berlin
Vassilakis PN, Kendall RA (2010) Psychoacoustic and cognitive aspects of auditory roughness:
definitions, models, and applications. In: Rogowitz BE, Pappas TN (eds) Human vision and
electronic imaging XV. SPIE/IS&T, Bellingham/Springfield, p 75270
von Bismarck G (1974a) Timbre of steady tones: a factorial investigation of its verbal attributes.
Acustica 30:146–159
von Bismarck G (1974b) Sharpness as an attribute of the timbre of steady sounds. Acustica
30:159–172
Wake S, Asahi T (1998) Sound retrieval with intuitive verbal expressions. Paper presented at the
5th international conference on auditory display, University of Glasgow, 1–4 November 1998
Walker P (2016) Cross-sensory correspondences: a theoretical framework and their relevance to
music. Psychomusicology 26:103–116
Wallmark Z (2014) Appraising timbre: embodiment and affect at the threshold of music and noise.
Dissertation, University of California
Wallmark Z (2018) A corpus analysis of timbre semantics in orchestration treatises. Psychol
Music. https://doi.org/10.1177/0305735618768102
Walsh V (2013) Magnitudes, metaphors, and modalities: a theory of magnitude revisited. In:
Simner J, Hubbard E (eds) Oxford handbook of synesthesia. Oxford University Press, Oxford,
pp 837–852
Webster J, Woodhead M, Carpenter A (1970) Perceptual constancy in complex sound identifica-
tion. Br J Psychol 61:481–489
Weinzierl S, Lepa S, Ackermann D (2018a) A measuring instrument for the auditory perception
of rooms: the Room Acoustical Quality Inventory (RAQI). J Acoust Soc Am 144:1245–1257
Weinzierl S, Lepa S, Schultz F et al (2018b) Sound power and timbre as cues for the dynamic
strength of orchestral instruments. J Acoust Soc Am 144:1347–1355
5 Semantics of Timbre 149