The Continuator: Musical Interaction With Style: François Pachet
The Continuator: Musical Interaction With Style: François Pachet
The Continuator: Musical Interaction With Style: François Pachet
1. Introduction
Music improvisation is both a fascinating activity and a
very frustrating one. Playing music requires an intimate
relationship between musical thought and sensory-motor
processes: the musician must think, listen, develop ideas
and move his/her fingers very quickly. The speed and lack
of time is a crucial ingredient of improvisation; it is what
makes it exciting. It is also what makes it frustrating:
beginners as well as experienced musical performers are
by definition limited by their technical abilities, and by
the morphology of the instrument.
We propose to design musical instruments that address
explicitly this issue: providing real time, efficient and
enhanced means of generating interesting musical
material.
Musical performance has been the object of numerous
studies, approaches and prototypes, using virtually all the
computer techniques at hand. In our context, we can
divide these approaches in two categories: interactive
In proc. of International Computer music Conference, Gotheborg (Sweden), ICMA, September 2002.
intervention to unexpected changes in rhythm, harmony
or style. Finally, the very design of the system allows the
sharing of stylistic patterns in real time and constitutes in
this sense a novel form of collaborative musical
instrument.
The remaining of the paper is structured as follows. First
we introduce the architecture of the proposed system, its
inputs and outputs. We then describe the heart of the
engine, based on a Markov based model of musical styles.
This model is augmented with 1) a hierarchical model of
learning functions to adapt to imprecision in musical
inputs and 2) a facility for biasing the Markovian
generation, to handle external information such as
changing harmony. Finally, we illustrate the use of the
system
in
various
musical
contexts:
solos,
accompaniments, and collaborative music improvisation.
2. Architecture
In this paper we focus on a Midi system linked to an
arbitrary midi controller. Experiments described here
were conducted with Midi keyboard and guitars, and are
easily applicable to any style and Midi controller. An
audio version is currently in progress, and the ideas
proposed in this paper are in a large respect independent
of the nature of the information managed.
We consider music as temporal sequences of Midi events.
The information we represent are: pitch (integer between
0 and 127), velocity/amplitude (also between 0 and 127),
and temporal information on start and duration times,
expressed as long integers, with a precision of 1
millisecond, as provided by the MidiShare Midi operating
system (Orlarey & Lequay., 1989). In the standard
playing mode, the system receives input by one musician.
The output of the system is sent to a Midi synthesizer and
then to a sound reproduction system (see Figure 1).
Midi controller
Midi synthesizer
Midi input stream
The Continuator
Midi output stream
In proc. of International Computer music Conference, Gotheborg (Sweden), ICMA, September 2002.
complete and 2) as efficient as possible. We describe here
briefly the design of this learning scheme, which can be
seen as an efficient implementation of a complete
variable-order Markov model of input sequences, as
initially introduced by (Ron et al., 1996).
This technique consists in building a prefix tree by a
simple, linear analysis of each input sequence. Each time
a sequence is input to the system, it is parsed from right to
left and new prefixes encountered are systematically
added to the tree. Each node of the tree is labeled by a
reduction function of the corresponding element of the
input sequence. In the simplest case, the reduction
function can be the pitch of the corresponding note. We
describe in the next section more advanced reduction
functions, and stress on the their role in the learning
process. To each tree node is attached a list of
continuations encountered in the corpus. These
continuations are represented as integers, denoting the
index of the continuation item in the input sequence. This
indexing scheme makes it possible to avoid duplicating
data by manipulating only indexes. When a new
continuation is found for a given node, the corresponding
index is added to the nodes continuation list (shown in
the figure between accolades {}).
For instance, suppose the first input sequence is {A B C
D}. We will progressively build the tree structure
illustrated in Figure 2. These trees represent all possible
prefixes found in the learnt sequences, in reverse order, to
facilitate the generation process (see next section). In the
first iteration, the sequence is parsed from right to left,
and produces the left tree of Figure 2. First, the node C is
created, with continuation index {4}, representing the last
D of the input sequence. Then node B is added as a son of
node C, with the same continuation index {4}. Finally,
node A is created as a son of node B, with the same
continuation index.
Then the parsing starts again for the input sequence minus
its last element, i.e. {A B C}, to produce the middle tree
of Figure 2. In this tree, all nodes have {3} as a
continuation (meaning item C). Finally, the sequence {A
B} is parsed and produces the tree on the right of Figure
2. Nodes are created only once the first time they are
needed, with empty continuation lists. The tree grows as
new sequences are parsed, initially very quickly, then
more slowly as patterns encountered are repeating.
{4}
C
B
{3}
B
{4}
A
{2}
A
{3}
A
{4}
{4}
B
{3, 8}
B
{4}
A
{4}
{2}
A
{3}
B{8} A
{8}
A
{4}
B
{3, 8, 7}
B
{4}
A
{4}
{2, 6}
A
{3, 7}
B{8} A
{8}
A
In proc. of International Computer music Conference, Gotheborg (Sweden), ICMA, September 2002.
sequence, or do not find the corresponding node. When
the walkthrough is finished, we simply return the set of
continuations of the corresponding node. In our case we
find a continuation for the whole input sequence {A B}:
Continuation_List ( {A B} ) = {3, 7}.
Theses indexes correspond to items {C, B}. A
continuation is then chosen by a random draw. Suppose
we draw B. We then start again with the new sequence {A
B B}, for which we repeat the retrieving process to find
the continuation list:
Continuation_List ( {A B B } ) = {8}.
We chose the only possible continuation (index 8
corresponds to item C) and get {A B B C}. We do not
find any continuation for the whole sequence {A B B C},
but we get continuations for the longest possible
subsequence, that is here:
Continuation_List ( { B C} ) = {4}.
We therefore get the sequence {A B B C D} and continue
the generation process. At this point, there is no
continuation for {A B B C D} as well as for any
subsequence ending by D (indeed, D has always been a
terminal item in our learnt corpus).
In this case, when no continuation is found for the input
sequence, a node is chosen at random. We will see in the
next section a more satisfactory mechanism for handling
such cases of discontinuity.
It is important to note that, at each iteration, the
continuation is chosen by a random draw, weighted by the
probabilities of each possible continuation. The
probability of each continuation is directly given by
drawing an item with an equal probability distribution,
since repeating items are repeated in the continuation list.
More precisely, for a continuation x, its probability is:
Markov_Prob(x) = nb of occurrences of x in L, where L is
the continuation list.
Since the continuations are in fact indexes to the original
sequences, the generation can use any information from
the original sequence which is not necessarily present in
the reduction function (e.g. velocity, rhythm, midi
controllers, etc.): the reduction function is only used to
build the tree structure, and not for the generation per se.
4. Reduction functions
As we saw in the preceding section, the graph is not built
from raw data. A Midi sequence has many parameters, all
of which are not necessarily interesting to learn. For
instance, a note has attributes such as pitch, velocity,
duration, start time. A chord has attributes such as the
pitch list, possibly its root key, etc. The system we
propose allows the user to choose explicitly from a library
of predefined reduction functions. The simplest function
is the pitch. A more refined function is the combination of
In proc. of International Computer music Conference, Gotheborg (Sweden), ICMA, September 2002.
would then draw a new note at random, and actually start
a new sequence.
5.1. Polyphony
Figure 6. An input sequence which does not match
exactly with the learnt corpus.
5.2. Rhythm
Rhythm refers to the temporal characteristics of musical
events (notes, or clusters). Rhythm is an essential
component of style and requires a particular treatment. In
our context, we consider in effect that musical sequences
are generated step by step, by reconstructing fragments of
already parsed sequences. This assumption is
unfortunately not always true, as some rhythms do not
In proc. of International Computer music Conference, Gotheborg (Sweden), ICMA, September 2002.
afford reconstruction by slicing arbitrarily bits and pieces.
As Figure 7 illustrates, the standard clustering process
does not take into account the rhythmic structure, and this
may lead to strange rhythmical sequences at the
generation phase.
This problem has no universal answer, but different
solutions according to different musical contexts. Based
on our experiments with jazz and popular music
musicians, we have come up with three different modes
that the user can choose from:
Natural rhythm: The rhythm of the generated sequence is
the rhythm as it was encountered during the learning
phase. In this case, the generation explicitly restitutes the
temporal structure as it was learned, and in particular
undoes the aggregation performed and described in the
previous section.
Linear rhythm: this mode consists in generating only
streams of eight-note, that is with a fixed duration and all
notes concatenated. This allows generating very fast and
impressive phrases, and is particularly useful in the bebop style.
Input rhythm: in this mode, the rhythm of the output is the
rhythm of the input phrase, possibly warped if the output
is longer than the input. This allows to create
continuations that sound like imitations rhythmically.
Fixed metrical structure: For popular and heavily
rhythmic music, the metrical structure is very important
and the preceding modes are not satisfactory. Conklin
and Witten (1995) suggest to use the location of a note in
a bar as yet another viewpoint, but this scheme forces to
use quantization, which in run raises many issues which
are intractable in an interactive context.
Wake up
time
Generation
In proc. of International Computer music Conference, Gotheborg (Sweden), ICMA, September 2002.
In the context of musical interaction, this property is not
always the right one, because many things can happen
during the generation process. In particular, in the case of
tonal music, the harmony can change. Typically, in a Jazz
trio for instance, the pianist play chords which have no
reason to be always the same, throughout the generation
process. Because we target a real world performance
context, these chords are not predictable, and cannot be
learnt by the system prior to the performance. The system
should be able somehow to take this external information
into account during the generation, and twist the
generated sequence in the corresponding directions.
The idea is to introduce a constraint facility in the
generation phase. External information may be sent as
additional input to the system. This information can be
typically the last 8 notes (pitches) played by the pianist
for instance, if we want the system to follow harmony. It
can also be the velocity information of the whole band, if
we want the system to follow the amplitude, or any
information that can be used to influence the generation
process. This external input is used as follows: when a set
of possible continuation nodes is computed (see section
on generation), instead of choosing a node according to its
Markovian probability, we weight the nodes according to
how they match the external input. For instance, we can
decide to prefer nodes whose pitch is in the set of external
pitches, to favor branches of the tree having common
notes with the piano accompaniment.
In this case, the harmonic information is provided
implicitly, in real time, by one of the musician (possibly
the user himself), without having to explicitly enter the
harmonic grid or any symbolic information in the system.
More precisely, we consider a function Fitness (x,
Context) with value in [0, 1] which represents how well
item x fits with the current context. For instance, a Fitness
function can represent how harmonically close is the
continuation with respect to external information. If we
suppose that piano is the set of the last 8 notes played by
the pianist for instance, Fitness can be defined as:
Fitness(p, piano) = | p piano | / |piano |
7. Experiments
We have conducted a series of experimentations with
system, in various modes and configurations. There are
basically two aspects we can assess:
1 The musical quality of the music generated,
2 The new collaborative modes the system allows.
We review each of these aspects in the following sections.
In proc. of International Computer music Conference, Gotheborg (Sweden), ICMA, September 2002.
his/her own style. Additionally, we experimented
improvisations in which one musician (Gyorgy Kurtag)
had several copies of the system linked to different midi
keyboards. The result for the listener is a dramatic
increase in musical density. For the musician, the
subjective impression ranges from a cruise button
with which he only has to start a sequence and let the
system continue, to the baffling impression of a musical
amplifying mirror.
-
8. Conclusion
We have described a music generation system, which is
able to produce music satisfying two traditionally
incompatible criteria: 1) stylistic consistency and 2)
interactivity. This is made possible by introducing several
improvements to the basic Markovian generation, and by
implementing the generation as a real time, step-by-step
process. The resulting system is able to produce musical
continuations of any user including beginners according to previously learnt, arbitrary styles.
Additionally, the design of the system makes it possible to
share musical styles, and thus to open new modes of
collaborative playing.
Current work is devoted to the elaboration of an extensive
style library by recording material from experienced, top-
Acknowledgments
We thank Gyrgy Kurtag, Bernard Lubat and Alan Silva
for intense and fruitful interactions with the system.
References
Assayag, G. Dubnov, S. Delerue, O. Guessing the
Composer'
s Mind: Applying Universal Prediction to
Musical Style, Proc. ICMC 99, Beijing, China, 1999.
Baggi, D. L. NeurSwing: An Intelligent Workbench for
the Investigation of Swing in Jazz, in Readings in
Computer Generated Music, IEEE Computer Society
Press, 1992.
Biles, John A. Interactive GenJam: Integrating Real-Time
Performance with a Genetic Algorithm, Proc. ICMC 98,
Ann Arbor, Michigan, 1998.
Jan Borchers, Designing Interactive Musical Systems: a
Pattern Approach, HCI International '
99. 8th
International
Conference
on
Human-Computer
Interaction, Munich, Germany, from 22-27 August,
1999.
Conklin, D. and Witten, Ian H. Multiple Viewpoint
Systems for Music Prediction, JNMR, 24:1, 51-73,
1995.
Cope, David. Experiments in Musical Intelligence.
Madison, WI: A-R Editions, 1996.
Heuser, Jorg, Pat Martino His contributions and
influence to the history of modern Jazz guitar. Ph.D
thesis, University of Mainz (Ge), 1994.
Hiller, L. and Isaacson, A., Experimental Music, New
York: McGraw-Hill, 1959.
Karma music workstation, Basic guide. Korg Inc.
http://www.korg.com/downloads/pdf/KARMA_BG.pdf,
2001.
Lartillot O., Dubnov S., Assayag G., Bejerano G.,
Automatic modeling of musical style Proc. ICMC 2001,
La Habana, Cuba.
New Interfaces for Musical Expression (NIME'
01),
http://www.csl.sony.co.jp/person/poup/research/chi2000
wshp/, 2000.
In proc. of International Computer music Conference, Gotheborg (Sweden), ICMA, September 2002.
Orlarey, Y. Lequay, H. MidiShare: a Real Time multitasks software module for Midi applications Proc. of
ICMC, pp. 234-237, 1989.
Pachet, F. Roy, P. "Automatic Harmonization: a Survey",
Constraints Journal, Kluwer, 6:1, 2001.
Ramalho G., Ganascia J.-G. Simulating Creativity in Jazz
Performance. Proc. of the AAAI-94, pp. 108-113,
Seattle, AAAI Press, 1994.
Ron, D. Singer, Y and Tishby, N. (1996) The power of
amnesia: learning probabilistic automata with variable
memory length, Machine Learning 25(2-3):117-149
Trivio-Rodriguez, J. L.; Morales-Bueno, R.; Using
Multiattribute Prediction Suffix Graphs to Predict and
Generate Music, CMJ 25 (3) pp. 62-79 , 2001.
William F. Walker A Computer Participant in Musical
Improvisation, Proc. of CHI 1997. Atlanta, ACM Press,
1997.