Book Review: David HURON. Voice Leading: The Science Behind A Musical Art
Book Review: David HURON. Voice Leading: The Science Behind A Musical Art
Book Review: David HURON. Voice Leading: The Science Behind A Musical Art
net/publication/317035388
Book Review: David HURON. Voice Leading: The Science behind a Musical Art.
CITATIONS READS
0 584
1 author:
Fabian C. Moss
École Polytechnique Fédérale de Lausanne
8 PUBLICATIONS 5 CITATIONS
SEE PROFILE
All content following this page was uploaded by Fabian C. Moss on 26 November 2017.
Cambridge, MA: MIT Press, 2016. ISBN 9780262034852. 272 pp. $38.
___________________________________________________________________________
David Huron’s book on voice leading is the state-of-the-art account of the psychological
principles that govern the perception of individual voices in a piece of music. It is not yet
another instruction for part writing in Bach chorale style. Rather, it is the culmination of
decades of scientific research, a great deal of it done by Huron himself,1 showing how much
of the traditional canon of voice-leading rules can indeed be explained by empirical research
on auditory perception. It bears mentioning that Huron understands voice leading as a rather
broad concept and loosely defines it as “the art of combining concurrent musical lines or
melodies” (1). The fundamental thesis of Huron’s book states that voice-leading rules are
These principles may serve as guidelines for any composer pursuing the goal of maintaining a
coherent auditory scene—that is, one with clearly distinguishable individual voices at any
given moment. Of course, Huron is aware that composers may or may not choose to follow
the strategies that benefit auditory scene analysis, much as a painter might or might not
follow the rules of perspective. But taking into account the multitude of all potential
Financial support for the author has been provided by the Zukunftskonzept at TU Dresden
https://doi.org/10.1525/mp.2001.19.1.1.
compositional goals (for example, as influenced by social, economic, or aesthetic factors)
would go beyond the confines of a single coherent theory. Consequently, Voice Leading
restricts itself to one of them: the goal of creating coherent auditory scenes. In doing so,
Huron grounds his approach in Albert Bregman’s psychoacoustical theory of auditory scene
analysis.2 He aims to “[p]rovide a scientific explanation for the core part-writing rules” as
psychological findings. This book thus bridges the still large gap between music theory and
music psychology by emphasizing that a better understanding of how we listen can lead to a
better understanding of why music is created in a particular way (i.e., why certain voice-
leading rules pertain to the canon, or why certain composers chose to write in a particular
style).3
voice-leading rules, Huron explicitly stresses that his book should by no means be understood
as laying out the headgear for composers. Hence, his approach is fundamentally not
prescriptive but is meant as a “roadmap that describes what happens when a musician
chooses one path rather than another” (2) in order to accomplish a specific musical goal. This
argument leaves the whole issue of the composer’s free will untouched, but imposes high
demands on his or her rationality. Importantly, this approach is not restricted to the canon of
the common-practice period but extends to other styles and genres. By showing how each of
2
Albert Bregman, Auditory Scene Analysis: The Perceptual Organization of Sound
this musical goal, they become meaningful, reasonable, and open to scrutiny.
Voice Leading is divided into seventeen chapters. All of the core chapters conclude
with a reprise that summarizes the essential points and integrates the findings into the
overarching context.
Chapters 2 and 3 lay out the basic music-theoretical and psychological framework.
After pointing out the ubiquity of disagreement over which voice-leading rules the canon
should include, chapter 2 (“The Canon”) introduces a preliminary set of sixteen rules
music theory classes nowadays. Chapter 3, “Sources and Images,” introduces basic concepts
from auditory perception. Huron describes in a concise and very accessible manner the
phenomenon of auditory scene analysis: how sounds from acoustic sources, made up from
elementary partial tones, find their way through the inner ear into the brain to form auditory
images (e.g., the percept of a piano tone) or streams (e.g., the percept of an ascending major
scale played by a harp), drawing a sharp line between the acoustic (physical) and the auditory
(physiological and psychological) realms. The listener’s reward for successful auditory scene
analysis is a subjective feeling of pleasure. “Successful” in this sense does not necessarily
mean that the auditory scene is in one-to-one correspondence with the acoustic reality, but
rather that the evoked auditory images are consistent with one another (leaving no room for
ambiguities). Relating successful auditory scene analysis to the experience of pleasure lies at
the heart of Huron’s explanation of voice-leading rules: they increase pleasure by facilitating
Chapters 4, 5, and 6 provide a more detailed account of how auditory scenes are
two. The first is harmonic fusion, which describes the mental phenomenon whereby a single
auditory image is formed from multiple sound sources. Harmonic fusion depends mostly on
the frequency ratio of the respective fundamentals of the sounds: tones in harmonic, or
consonant, distance are more prone to fusion. The second, the toneness principle, describes
the degree to which a sound sounds “like a tone” (as opposed to noise) and has a clear pitch.
It turns out that toneness for complex tones (e.g., as produced by musical instruments) is high
for pitches between E2 and G5 (roughly the maximal ambitus of a chorale) and optimal
around D4, located roughly at the “middle” of the staff. Chapter 5 covers the topic of
auditory masking, which describes the potential interference of several sound sources with
one another. Masking occurs when the partial frequencies of two or more tones are too close
to each other (less than a critical bandwidth) to be resolved into separate auditory images.
The effect is that one tone (partially) “masks” another. Hence, a composer who wants to
convey several independent voices must design them in a way that avoids auditory masking,
following the minimum masking principle. Artifacts of this strategy are the voicing
techniques in different registers (true to the motto “the lower you get, the more you spread”),
and the fact that most listeners can focus more easily on the melody when it appears in the
soprano because in general, for voices similar in loudness and timbre, higher-pitched voices
are less affected by masking from lower-pitched voices than vice versa (the high-voice
superiority effect). Since the degree of masking correlates with the difficulty of distinguishing
among several distinct voices, masking creates a “feeling of irritation or annoyance” (52),
which in turn might lead to some form of behavioral reaction (leaving the room, asking
Huron favors the term “(sensory) irritation” over “sensory dissonance,” commonly used in
4
Note that masking can occur not only in musical contexts but in all hearing circumstances.
and describes phenomena leading to avoidance behavior, dissonance in music is not a
criterion for quality: “The best music is not music that avoids dissonance” (56). In other
words, there is a lot of good music that is quite dissonant. Having considered so far two or
(“Connecting the Dots”) to the succession of such musical events in time. Mentally
connecting multiple events to form a single stream is facilitated when the events are either
contiguous or separated by only a short duration of about 800 milliseconds (the continuity
principle). The continuity principle is at work, for instance, in latent polyphony: as is well-
known, a single monophonic instrument (such as a human voice, a violin, or a cello) can
evoke the impression of several distinct lines. This effect depends on both the relative
duration and the pitch distance between the virtual voices. If both factors are below a certain
threshold (“trill boundary” [72]), the result is the perception of a single stream. If both are
above another threshold (“yodel boundary” [72]), the result is two or more separate streams.
This is reflected, for example, in the theoretical step/leap distinction and also in the fact that
in many of the world’s cultures, melodies are usually connected by steps5—that is, intervals
below the trill threshold. Huron coins this observation the pitch proximity principle. Similar
or parallel motion also contributes to grouping tones together. Importantly, this holds not
only for complex tones (e.g., as entities in contrapuntal contexts), but also for all the partials
of each of the moving complex tones. They do so most strongly when they are moving
strictly parallel and are harmonically related (e.g., unisons, octaves, fifths, and so on). The
conjoint motion of two tones is called co-modulation and forms Huron’s sixth perceptual
principle. Co-modulation is strongest if the two voices move in parallel. Obviously, this goes
directly against the perception of independent voices. It is easy to see why this principle is at
5
Huron, “Tone and Voice,” 25.
the basis of the strong and long-standing advice to novice composers to avoid parallel octaves
and fifths.
auditory images introduced up to this point are thus harmonic fusion, toneness, minimum
Rules”) builds on the established framework and re-evaluates the voice-leading rules as
preference rules to “clarify the logic, address the pertinent details, and make it easier to see
unanticipated repercussions” (87). The concept of preference rules follows the tradition of A
Generative Theory of Tonal Music and The Cognition of Basic Musical Structures,6 both of
which point out that these should be understood as guidelines rather than prescriptive
directives. The objective of Voice Leading is to show which auditory principles facilitate the
perception of individual voices in music, a central goal in many musical styles. Accordingly,
the preference rules that Huron introduces are empirically based means to achieve this goal
G1. The goal of voice leading is to facilitate the listener’s mental construction of
coherent auditory scenes when listening to music. In practical terms, the goal of voice
leading is to create two or more concurrent yet perceptually distinct “parts,” “voices,”
or “textures.” (88)
Starting from this goal, Huron derives a set of twenty-three preference rules by successively
observations (E) and corollaries (C). The implications include well-known compositional
6
Fred Lerdahl and Ray Jackendoff, A Generative Theory of Tonal Music (Cambridge, MA:
MIT Press, 1983); David Temperley, The Cognition of Basic Musical Structures (Cambridge,
principle mentioned above. They also contain less obvious preference rules such as the
“toneness rule,” which requires “[p]refer[ring] the use of harmonic complex tones—tones
that evoke clear pitch sensations” (89) (obviously, this rule is tacitly followed when writing
tonal music). Furthermore, they enclose new preference rules that, according to Huron,
exceed the extent of the traditional canon, such as the “Oblique Preparation Rule,” meaning
retain the same pitch in one of the voices (i.e. approach by oblique motion)” (94). Probably
the most famous voice-leading rule is the “Perfect Parallels Rule” (95), which Huron
elegantly deduces from the principles of harmonic fusion and pitch co-modulation. In each
case, Huron points out how the rule helps to achieve the above stated goal. Again, he
emphasizes that one should be careful not to confuse this particular goal of voice leading with
Out of context, the enumeration of statements in this chapter might seem tedious and
mechanical, even arbitrary (the classic criticisms of all kinds of compositional rules). But
given the background of the empirical research provided in this book, the reader is easily
convinced that voice-leading rules were not randomly set up once upon a time by music
theorists but are indeed shaped by basic features of our auditory perception and hence
to the formation of separate auditory streams: onset asynchrony (as in polyphony), limited
density (the fact that it is hard even for professional musicians to identify more than three
concurrent voices), timbral differentiation (it is easier to disentangle a violin and a trumpet
than two flutes), and finally, source location (spatial clues for voice segregation). These
“auxiliary principles” partially account for stylistic differences (e.g., homophony vs.
polyphony) and individual choices by composers. Consequently, this approach offers a useful
framework for comparing different musical styles, including popular and non-Western styles.
Following the more general observations of the previous sections, chapters 9–12 go
into more detail and consider specific functions of single tones. Chapter 9 (“Embellishing
Tones”) is dedicated to the function of embellishing tones such as suspensions, passing tones,
neighbor notes, and pedal tones, and shows how they contribute to the individuality of
streams, for instance by drawing attention to a middle voice. This is summarized in the
attention principle and three more preference rules. Chapter 10 (“The Feeling of Leading”)
focuses on “tendency tones” (e.g., leading tones and chromatic alterations) and distinguishes
expectations resulting from the familiarity with particular musical styles, and dynamic-
memory (e.g., expecting that a repetition will be a copy of what we heard earlier, or
determined by bottom-up perceptual principles, whereas “voice leading” also takes advantage
of top-down expectations, which enables listener to anticipate, for instance, how a melodic
line might continue. In short, “predictability transforms good part-writing into good voice
leading” (144). Six preference rules are derived from this expectation principle. Chapter 11
theorists, there is not much to worry about from a perceptual perspective, since most
7
For more detail, see David Huron, Sweet Anticipation: Music and the Psychology of
into perfect consonances by similar motion (hidden parallels), pointing out the underlying
principles (harmonic fusion, pitch co-modulation, and pitch proximity). Huron explains how
abiding by these principles ensures stream segregation and thus the individuation of voices.
In brief, rules about embellishing and leading tones, as well as chordal-tone doubling and
motion into perfect consonances, can be derived from combinations of only a few basic
perceptual principles.
Having considered the empirical findings on very specific aspects of the treatment of
individual tones in voice leading, chapters 13 and 14 zoom out again and reflect upon entire
like visualization in a hierarchical manner, so-called scene analysis trees. These are meant to
depict how auditory images group together, from resolved partials to tones, chords, and entire
streams, to form coherent auditory scenes. The height of the respective branching refers to the
Crucially, this representation only shows an auditory scene at a given moment in time and
does not capture the dynamic changes of stream formation and segregation over the course of
a piece. Chapter 14 (“Scene Setting”) gives examples of several listening situations and
illustrates them with associated scene-analysis trees (i.e., representations of the mental
hierarchical relationships among auditory images). More precisely, they allow for the
description of “how that [acoustic] scene might be parsed by listeners into a corresponding
auditory scene” (182). Parsing—inferring an underlying structure from a given input (in this
case the sequential presentation of acoustic scenes)—is defined by the principles of auditory
scene analysis, as introduced in the preceding chapters. Conversely, these principles should
be used generatively by composers who wish to create coherent musical scenes in virtually
any genre, since the design of musical scenes is identified as “[o]ne of the main
voice leading provides the toolkit for designing auditory scenes. The rules of voice
leading aren’t simply tools for creating polyphonic music or Baroque-style chorales;
they are tools that allow composers to construct and control any kind of musical
texture. Voice leading truly is the art of combining concurrent musical lines. (179)
So why is it that Baroque style four-part writing is usually at the core of music theory
courses? Huron conjectures that this is not just a coincidence or a matter of arbitrary
historical traditions. He hypothesizes that it is a consequence of the fact that “[n]o other
historical practice conforms so closely to modern perceptual and cognitive research regarding
how independent sounds are heard in complex acoustic scenes” (181). With this chapter,
Huron concludes the core of the book, having built up a coherent theory of voice leading
starting from elementary observations about how pure and complex tones evoke auditory
images and how they can be combined to form a number of distinguishable auditory streams.
The principles and rules developed along the way specify how composers can achieve the
The discussion is put into a broader context in chapters 15 and 16. A short digression
to the topic of how learning and culture shape our auditory perception is made in chapter 15
(“The Cultural Connection”). By referring to the discussion about universals and innateness,
Huron advocates the view that “learning is the sole source for top-down auditory scene
analysis” (193) and also plays an important role for certain bottom-up processes, thus
emphasizing the importance of the cultural environment for the perception and appreciation
of musical hierarchies and other listening situations such as speech, or rhythmic grouping in
language. In the last chapter (“Ear Teasers”) Huron addresses the question of why the
perceptual individualization of multiple streams should be a musical goal in the first place—
in other words, why voice leading is a desirable musical feature leading to pleasure at all.
Huron argues that “the rules [of voice leading] help to reduce perceptual ambiguity and so
facilitate auditory scene analysis” (197) and “that the brain rewards itself for successfully
parsing auditory scenes and that the evoked pleasure is proportional to the scene’s
complexity” (204). Accordingly, an “ear teaser” is “any complex musical texture or acoustic
scene that nevertheless affords clear scene-analysis parsing, with a consequent pleasurable
effect for listeners who successfully resolve the sensory challenge” (198). While the theory
developed so far accounts for how musical scenes are set, Huron concedes that there is to
date almost no data on how these scenes change over time in a listener’s mind, so this
The conclusion (chapter 17) summarizes the twelve perceptual principles introduced
in the book (thus incidentally providing an excellent introduction) and revisits the two central
concepts of pleasure (as a reward for successful auditory scene analysis) and scene setting (as
a fundamental task for a composer in any style), both of which are modulated by the
workings of the principles that underlie voice leading. He restates the renewed canon of
thirty-seven voice-leading rules and concludes with some remarks on how performers can
take advantage of the research presented in Voice Leading. The book closes with an
afterword giving pedagogical advice on how its content might be integrated into the music
theory curriculum. Somewhat surprisingly, Huron discourages teachers from using his book
as an introduction to the topic of voice leading, arguing that the principles developed in his
book and the psychological underpinnings might be too complex for a novice who is still
struggling with the intricacies of basic part writing. However, to the more advanced student,
the empirical accounts of voice-leading rules might provide valuable insights about their
underlying causes.
The greatest strengths of Voice Leading are both its extensiveness and its
comprehensibility. Covering the whole range from bottom-up sensory principles (such as the
manner without delving into distracting particularities. The relevant phenomena are described
explanations in the form of metaphorical everyday situations. In the same spirit, Huron
refrains from stringing together lists of numbers or formulae. The reader interested in more
detailed accounts can always refer to the extensive references, which cover virtually all
book’s index also allows one to quickly find relevant passages. Voice Leading is thus ideally
suited for a broad audience lacking prior knowledge of empirical research on music
perception, a convenient read for musicians, music theory scholars and teachers, and a
some basic scientific terminology, such as the difference between causation and correlation,
parsing and generation, top-down and bottom-up approaches, and statistical techniques such
as multiple regression. It might, however, come as a surprise for reader with a background in
music theory that, compared with traditional counterpoint treatises, Voice Leading contains
almost no examples from the musical repertoire. The first score to appear is an excerpt from
Ravel’s Bolero, and a few others follow; but the literate reader will have no difficulty
transferring the findings to other examples. It serves Huron’s goal to show that the voice-
leading principles he explains extend far beyond a narrow corpus of Renaissance polyphonic
rules of voice leading is based on general auditory principles, the plausibility of the goal of
voice leading (creating coherent auditory scenes) is largely accounted for by his own
extensive corpus studies of the music of Bach. Naturally, this calls for more extensive work
on Bach’s precursors and contemporaries, as well as later composers and different styles, to
draw conclusions about the diachronic changes of the usage of voice leading in order to draw
hierarchical clustering, of how elements clump together to form larger units according to
certain principles (in this case the principles underlying the formation of auditory streams).
The distinction between analytic and synthetic hearing related to the branch length is
conceptually evident but somewhat imprecise, since this distance is not quantified as in a real
dendrogram plot. This approach deserves more attention in subsequent research and ought to
model dynamic processes in time, but rather depict momentary analyses. Owing to an
unfortunate gap in the research literature, the temporal dimension is still opaque, but as
Huron writes, “[T]he study of voice leading remains a work in progress rather than a finished
opus. With future research, the interpretations I have offered in this book will be corrected,
augmented, or replaced” (214), and new forms of representation might turn out to be more
hypotheses, Huron explicitly encourages scholars to continue and extend the scientific
rules in chapter 11. As already mentioned, he argues that avoiding chordal-tone doubling
(especially the doubling of the leading tone) mainly helps to prevent parallel octaves. His
argument, though, relies on corpus studies of works by Bach, Haydn, and Mozart,8 and,
ratios, rather than the broader theoretical conception of an octave as an interval spanning
seven scale steps, no matter its specific size and acoustical realization. Doubtless, traditional
voice-leading rules use the concept of the octave ambiguously and rely on both definitions,
which makes it plausible that they also were formulated to prevent other consequences of
tone doubling than octave parallels. In modal Renaissance music, in particular, doubling can
lead to cross-relations because of musica ficta, where the direction of the voice determines its
realization as natural or altered.9 As an example, take the extreme case of the end of Thomas
Tallis’s hymn O nata lux (Example 1), where the cross-relation between two tones sounds
clausula coincides briefly with the F<natural> in the Phrygian tenor clausula (a mi contra fa
situation). This sharply dissonant vertical major-minor sonority is inharmonic (in Huron’s
terminology, which is related to the configuration of the partial tones) and momentarily
suspends the harmonic fusion and toneness principles. Subsequently, instead of resolving the
dissonance into a consonance, the F in the tenor descends to E<flat>, creating a minor second
in the lower register. According to the minimum masking principle, they should
8
David Huron, “Chordal-Tone Doubling and the Enhancement of Key Perception,”
Paul T. von Hippel, “Rules of Chord-Tone Doubling (and Spacing): Which Ones Do We
http://www.mtosmt.org/issues/mto.04.10.2/mto.04.10.2.aarden_hippel.html.
9
See David Trendell, “After Josquin,” Early Music 35/1 (2007), 139–41,
https//doi.org/10.1093/em/cal120.
approximately form at least a perfect fourth to exceed the critical bandwidth and to be clearly
separated by the cochlea into two distinct tones (43, Figure 5.2).
Admittedly, examples like this one are rare and really stand out, confirming that our
ears (more precisely, all components involved in the creation of auditory scenes) rely on
perceptual principles in the first place. Furthermore, the dissonant intervals are approached by
oblique motion, providing a listener with another cue for resolving the voices: dissonance
examples like this one have little impact on the overall statistics. But, more important, one
can clearly see how it is possible for a composer to abandon some principles for the sake of
others, if musical goals other than stream segregation are pursued. This is in total compliance
with Huron’s constant mantra that voice leading is but one of a virtually infinite array of
potential compositional goals, albeit an important one, since it neatly relates to biological and
between music theory and music psychology. Owing to the growth of the discipline of music
cognition in the last decades, this vital issue has been addressed several times by such
researchers as Ray Jackendoff and Fred Lerdahl, Eric Clarke, Carol Krumhansl, David
Temperley, Geraint Wiggins, and Martin Rohrmeier.10 Oftentimes, scholars state that music
theoretical descriptions and analyses rely in fact on implicit assumptions about the cognitive
acknowledge the need for more interdisciplinary exchange, instead of accusing each other of
being reductionistic in one sense or another. With Voice Leading, Huron accepts this
challenge and shows that music theory and music psychology can indeed form a powerful
alliance. Furthermore, Huron supplements the existing literature on the matter, such as
10
Ray Jackendoff and Fred Lerdahl, “Generative Music Theory and Its Relation to
Clarke, “Mind the Gap”; Carol L. Krumhansl, “Music Psychology and Music Theory:
Geraint A. Wiggins, “Music, Mind and Mathematics: Theory, Reality and Formality,”
The book focusses on how our perceptual predispositions shape the way we hear, and,
consequently, how music that aims to evoke pleasure by setting up a rich but not overly
complex auditory scene can achieve this goal by following the rules proposed by the author.
Therefore, Voice Leading does indeed achieve to explain “the science behind a musical art.”
Fabian C. Moss studied music, mathematics, and educational studies at the University of
Cologne and the Hochschule für Musik und Tanz Köln, from which he also holds a MA in
Universität Dresden. His research interests include the connection between music
theory and cognition, especially formal and computational approaches to chromatic harmony
Common Practice (Oxford and New York: Oxford University Press, 2011).