Representing Musical Knowledge in A Jazz Improvisation System
Representing Musical Knowledge in A Jazz Improvisation System
Representing Musical Knowledge in A Jazz Improvisation System
Damon Horowitz
MIT Media Lab
20 Ames St.
Cambridge, MA 02139 U.S.A.
[email protected]
that the program always knows the metric and harmonic high-level actions and thoughts is described as requiring the
context, where the “breaks” are, etc.). The program listens activation of many different kinds of representations and
only to the single-line improvisations of the other soloists, processes with localized information. In this scheme,
and produces single-line improvisations in its turn; both of concepts are represented in memory by K-lines, which are
these lines are represented as MIDI streams, and the recorded state values of different agents. For example, an
operations happen in real-time. Finally, the current system apple is represented by a set of K-lines, each of which
has no provision for retaining learned material beyond a activates agents of color, taste, shape, weight, etc. to be in a
single session; it begins each session knowing only those particular state.
musical lines, figures, and concepts which are part of its In my model, musical concepts are represented in the
database. In principle, the current architecture is capable of model by specialized agents. Each type of concept has an
supporting an extension which retains new musical and agent that describes it, listing: its components and their
conceptual material over time. relations and qualities; the preconditions for its occurrence,
The central portion of the model is the representation of and the postcondition effects of its occurrence; the
melodic concepts, the types of musical ideas that the parameters which determine the range of different
program knows about and can therefore create and recognize. instantiations of the concept; and pointers to higher-level
This representation is used by the procedures that analyze agents for which this concept plays a role. By separating
and generate melodic lines. These procedures make use of concepts into agents, each agent can be an expert in a single
the following additional modules in the program (Figure 1): specialized and simple task; it need only be aware of that
a set of mental state variables, which represents the information which relates to its concept, thus insulating it
program’s current set of goals and emotions, as well as the from the complexity of dealing with unnecessary
levels of activation of concepts in the concept network; a information. Each agent is able to detect the occurrence of
motor-action module, which takes as input short fragments the concept that it corresponds to through sensors, and also
of textural descriptions of a line by the low-level pitch and to effect an occurrence of the concept through actuators.
rhythm agents and produces the actual output notes; For example, an agent for ornamentation of a pitch lists
routines for updating the current harmonic and metric the different types of ornaments which are possible
context (e.g., spreading activation to indicate an approaching (described in terms of the parameters that distinguish them),
chord change); a database of melodic lines and figures that has sensors that parse streams of pitches to locate these
correspond to the wide set of tunes, sections of ornaments around an important pitch, and can produce
improvisations, etc., that commonly occur in the solos of rhythm and pitch textures that realize ornaments of a given
this period (each musician is familiar with a set of phrases, pitch. Each agent is sensitive to context in that the
and also has his own “bag of tricks” that are repeatedly behavior of each of the mechanisms that works with each
used); and the set of melodic lines and figures that have been type of information is influenced by the settings of other
played during the current song and are therefore active ideas agents. In this example, the ornamentation agent is told
in the environment. These elements constitute the song- which pitch to elaborate, and then in turn tells the rhythm
environment or context that the program knows about when and pitch line agents how to create a specific ornament.
analyzing and generating. The agents are arranged in overlapping hierarchies
according to the nature of the concept that they represent.
Representation of Music The main axes of these hierarchies correspond to the
I propose that multiple representations of musical material following features: the effect of the concept, what it
are necessary for flexible and sophisticated analysis and conceptually opens or closes, its emotional quality, etc.; and
generation. My perspective in developing mechanisms the level of abstraction of the concept (from the musical
which can handle the large amount of information required surface to abstract concepts, and also from small chunks of
for musical intelligence is largely influenced by the theory time to larger forms), e.g., a concept that handles lists of
of mind described in Society of Mind [Minsky 86]. notes, contains contours of simple features, depicts an
Minsky suggests that intelligent behavior can arise from the abstract relation between chunks of music, etc. Thus, the
interaction of many specialized smaller agents; each of our set of musical concepts that the program knows about are
arranged in a large interconnected network, linked both by “marked for consciousness”, in terms of the coloration of
their positions in the hierarchies and by their relations as each event and its perceived accent given the roles that it is
components and pre-/post-conditions of each other. serving. For example, we might say that the first note of a
Each concept also has an activation level, a number phrase is the accessor through which we start replaying the
which reflects its presence in the musical material when phrase in our heads, but it is really the first foundational
listening, and its influence on the musical material that is tone that we remember as being what the phrase is about in
being generated. Activation is spread through the network a reduction, while we also focus on the particular elaborating
to guide analysis and generation. When a concept is tone that gave the phrase its distinctive quality.
recognized during the analysis of input, the agent associated Maintaining these different perspectives is useful for both
with that concept becomes more active, corresponding to the generating and analyzing new material.
intuitive notion of semantic priming used in cognitive These concepts are filters through which we can
science and linguistics. The more active agents are then the examine a phrase in order to determine its qualitative effects
first to be considered when examining new material. When and its similarity to other phrases from different
a concept is determined to be appropriate to occur in a perspectives. Identifying these levels in the musical surface
generated improvisation, it is also given higher activation; allows us to have a notion of a phrase that is richly
the degree of activation then determines the influence that interwoven with our general knowledge about types of
each agent has over the output. In addition, each of the phrases. It is through these devices that the concept of one
musical lines in the known database and each of the active phrase referring to another (either within the solo, or from a
musical figures in the song-environment also has a level of separate tune) is realized.
activation, corresponding to the degree to which it is
recognized in incoming music or the degree to which it is Melodic Line
considered in generating music. Activation levels are Different agents capture different features of a melodic line.
propagated between the figures and the concepts which Actual pieces of a line are stored as a collection of links to
constitute the figures; this completes the intuitive model of the states of agents in the network. In the terminology of
specific instances of figures priming the general concepts of SOM, a fragment of music is a set of K-lines which record
the figures, as well as the reverse process in which a general the state of different agents as they reflect the music, and can
concept primes the specific figures that are instances of it. later reactivate the agents to assume the same state. Each
The following section describes the types of concepts that agent can be seen as functioning as a feature-space onto
are thus represented in the system. which a section of music is projected, thus focusing upon
some subset of its features. The state of the agent when
Musical Concepts considering this piece of music is the set of parameter values
The musical concepts represented by the system are those and pointers to other agents that it uses to understand (parse)
which describe a musical phenomenon in terms of its the music. In sum, a melodic line is represented as the set
composition of smaller musical structures. While there are of perspectives on the line contained by the different agents
a variety of conflicting interpretations about how humans in the system.
represent music, there are several types of structures which For example, the multiple representations of a simple
seem to be generally and uncontroversially acknowledged as motive (a list of timed notes) are the set of K-lines reflecting
contributing to our cognition of music. On a basic level, its rhythmic, melodic, and harmonic components (Figure 2).
features of notes (their relative emphases, articulations, color The rhythm agent determines the level of activity (which is
[harmonically], register, etc.) must be recognized and roughly similar to pulse level), the amount and type of
recorded as contours of changing parameter values over time syncopation, and the prominence of other cycles (e.g., a
[Dowling 94]. These features also imply expected crossrhythm or polyrhythm) to find salient rhythmic figures.
continuations, as suggested by [Narmour 90]. The next The melody agent looks at the pitch contour and the
level of structure consists of groupings of events (notes and abstracted simplified lines (prolongation and timespan
sets of notes), based upon similarity of features. These reductions) for key anchor points. The harmony agent
simple levels of comprehension are necessary for musical determines the hierarchical relations of tension and
tasks, and are producable by mechanisms which exist for resolution pitches given the harmonic framework of the
general purpose human cognition. song. Each of these agents then has its own representation
Analytic reductions of music are higher-level musical of possible groupings, and points of emphasis. Finally,
structures; these are interpretations through which some there is a structural agent, which examines the patterns
notes can be heard as serving subsidiary roles to others. In present in each of the specific agents' representations (by
particular, some events are said to be elaborations or querying them) and indicates places of repetition or simple
ornaments of major events, or are said to lead up to or variation of structure. One of the strong features of this
prolong major events, recursively through larger sections of model is that it allows this approach to maintain the
music [Lerdahl and Jackendoff 83]. This is relevant here in integrity of having each agent manage its own information
so far as it relates to human comprehension and memory of in its own way; for example, each agent has different criteria
a line of music; instead of simply memorizing a list of for what constitutes a similar repetition or a slight variation.
timed pitches as such, a musician can remember the major The key point is that the system has multiple
events and the ways in which they are elaborated. Forming representations of a melodic line; this is appropriate
groups and reductions requires perspectives on what in the becauseof the fact that different functions require different
musical surface is perceived as standing out or being types of information about a line, since different tasks
reduced melody: F g E D a C
rhythmic activity:
repeated structures:
operate from different perspectives. in the agents conceptually beneath it) that is a component of
The benefit of this type of representation is that a single the concept it represents. In this way, the different agents
musical line can be manipulated in different ways depending piggyback on each others’ discoveries of features in the
upon which aspects of it (i.e. which agents’ features) are input. For example, the low-level rhythm agent parses the
considered. A simple example of this is the separation of input into notational rhythm values, which are then grouped
rhythmic and pitch information about a line; a more by a rhythm-grouping agent, and examined by a cycle-agent
complex example is the separation of a prolongation for the presence of crossrhythms.
reduction perspective from the set of local elaboration High level agents represent musical form, a concept
figures that are used. Further, since the agents themselves referring to any level of structural repetition. This concept
exist in hierarchies of types of concepts, a given line can be identifies the repetition of similar phrases and the repetition
seen as a specific instantiation of general abstract types of of a figure within a phrase. These agents recognize
concepts. This abstraction allows for comparison of repeating patterns in the music by looking for patterns in
musical figures according to their types along abstract the other agents’ parses of musical lines into features.
feature axes, and provides a rich set of metrics for Rather than examining each agent’s state over time, the
determining their similarity or relation (information which form-agents poll the other agents, requesting to be notified
is necessary for understanding music, and also used in when a repetition has occurred. Each individual agent
generating variations). locates simple patterns in its type of feature by itself,
Given these types of representations of the musical line similar to the phenomenon suggested by SOM as time-
itself, specifically of the structures of phrases, parts of b l i n k i n g . The form-agents look for repetition in
phrases and groups of phrases related to context over time, conventional places (e.g., corresponding to metrical
several phenomena related to our perception, consideration, divisions), but can also be activated if an agent determines
and creation of music can also be modeled. The general idea an unconventional repetition, which in turn will cause the
of chaining musical ideas, or of having musical associations form-agents to suggest to the cycle-agent that a crossrhythm
or priming of categories, is represented in this model by may exist. Again, the abstract hierarchies present in the
spreading activation of ideas through hierarchies of types of representation are useful here; patterns which are not literal
structures defined above. This leads directly into the model’s repetitions in the music surface but are direct conceptual
description of generating music, which is summarily repetitions can be easily detected with this scheme.
described as follows: goals and intentions spread activation The other major activity of listening is the spreading of
to concepts which realize them (downward through a activation between concepts. This models the experience of
network), while at the same time the currently active figures priming and expectation -- predicting future material based
and structures in the environment spread activation to their upon the past. The representation of concepts containing
related concepts (up through a network), and the concepts sequential events lends itself to a direct implementation of
with the highest activation are the ones which are realized in this idea: when a given agent’s sensors have noticed the
the generated music. The following sections describe the presence of a component which begins a sequence in a
listening and generating functions of the program. These concept, the agent primes the appropriate feature detectors
functions are the inspiration for the general representation (either sensors or lower agents) to expect the next
scheme outlined above, which is designed to accommodate components in the sequence. For example, an agent for
the types of manipulations of information that analyzing and tonal resolution expects a tonic after detecting a leading
playing require. tone, and thus the pitch agent is set to look for the
appropriate note. This corresponds roughly to the
Listening description of language understanding in SOM. The input
The program’s task when listening is to identify, in an of a word (here, a musical “word” is an event for an agent at
incoming stream of notes, the musical concepts which the any level) activates a set of frames which are interested in
system knows about. In other words, the activity of this word, either semantically (e.g., an apple activates ideas
listening consists of building instances of the about eating) or in terms of conceptual dependencies (e.g.,
representations discussed above. This is accomplished the origin role of a trans-frame). These correspond
through the use of the sensors on each type of respectively to spreading of activation to more generalized
representation; the sensors detect a feature in the input (or concepts from the parsed one, and following a sequential
figure which expects later roles to be filled in. A failure of determine what the next motor action will be. In other
the sensors to find the expected event requires extra words, the decision of what to do is made by simply
processing to recommence the parse of the input. The checking which actions look best at the time that an action
occurrence of this frustration, at different levels of severity, is needed; agents with the highest values among the
is a key component to Meyer’s theory of emotion in music competing agents when polled by the motor routine are the
[Meyer 56]. The system implements this theory literally by ones which affect the output. This corresponds to the
changing the emotional state variables of the program when intuitive notion of a player selecting a sequence of small
these frustrations occur. The listener thus indicates both the paths that he can realize on his instrument; this approach
comprehensibility of input (in terms of its susceptibility to assumes that all actions are motivated, if by nothing else
each agent's attempts to analyze it) and also its qualitative then by the goal to just do something, or by a kinesthetic
interest (each agent can subjectively label the local sense [Sudnow 78]. Note that to be silent is to have the
phenomena it encounters). explicit action concept of rest be more active than the
The final result of the listening process is that the input others. Here are some other examples of action/concept
musical line is memorized as part of the active environment descriptions which could be actively influencing the output:
during the song. It remains in this buffer, accessible for launch a new phrase, choose a motive from the melody, vary
further listening and playing processes, until the conclusion the previous phrase, repeat a partial structure, hit a high
of the song. The active musical ideas already in the “C”, use filler material for two beats, emphasize a structural
environment are compared against the incoming line. A element, assert a syncopated rhythmic figure, conclude this
match can be used to identify the higher-level structural solo, etc.
forms being played. That is, the system can recognize if The use of goals here is based upon the standard use in
material (conceptual or literal notes) from its set of ideas is spreading activation networks for action selection [Maes 94].
being quoted or referred to, and can then label the current The goals spread to concepts that produce the qualitative
setting appropriately (e.g., as an elaboration of the song’s label associated with the goal: pitch/time trajectories
tune, or as a series of variations on the previous solo’s modeled after [Clynes 78] relate directly to emotional
closing phrase). At no point is it necessary to decide upon a qualities, as do higher-level relations between phrases (e.g.,
single group parsing or reduction interpretation of the input, call and response, amplification or elaboration, contrast,
since all of the active agents’ states are stored as a set of K- etc.). [Rieken 92] describes a system that is similarly
lines that represents the figure. However, the relationship interested in the relationship between the effect of musical
between consecutive phrases (or between any phrase and its features (his focus is on pitch intervals) and emotions in
original referent) indicates which aspects of the determining generated music. In my model, the framework
representation, and thus which interpretation, are focused allows for the labeling of any type of concept (e.g.,
upon in a given instance. These sequences of relations (such structures, groupings, reductions, roles, colorations) with an
as the maintenance of a reduced melodic line or of rhythmic affective quality, which can vary as a function of context.
figures) are then stored as the high-level structural form Each concept can be seen as a short script for how to
describing a script that the input is assumed to have perform a certain behavior. The actuators of an agent define
followed. Using the original solo’s relations between its the sequence of steps and types of conditions required to
phrases is a technique for creating a similar set of phrases in realize the concept. This occurs both on a small level, as in
a new solo, leading to the effect of having played the same the case for a script ornamenting a pitch, and on a larger
“sort of thing” or in the same “style”. level, such as a conventional script for developing material
over the course of a solo. A script for a solo is represented
Generation in terms of form-agent structural relations between the
Generation of an improvisational line is viewed here as an phrases in the solo, with the corresponding emotional effects
action selection problem. I assume that an improvisation for each relation; this allows for a description of rising
does not follow a simple set of rules, but rather is influenced intensity, or trajectories through “moods”, or an entire a
by a variety of sources concurrently. This type of decision dramatic form played over the course of a chorus. For
making is well modeled by a spreading activation network example, a script for a solo could say: begin with a
which responds to both goals and the environment [Maes statement of material active in the environment (perhaps
89]. My model uses a hybrid system in which actions are from the song’s main tune); focus on a perspective of this
chosen by competition through spreading activation material that corresponds to an affective label in keeping
combined with traditional AI structures of rule-based with the program’s mental state; maintain this aspect while
constraints and script-following fragments. The actions in creating a variation of the original statement; evolve from
the network are the hierarchically arranged musical concepts, this newly played material through successive variations
each of which is a type of thing that can happen in music. while rising in intensity (modeled as a degree of
When generating a line, the program’s goals spread exaggeration in the parameters of a perspective) towards a
activation through the network of concepts, as do the active climax; then close the solo, resolving any active musical
ideas in the song environment. As this spreading activation concepts.
is occurring, the motor module launches sequences of a few Spreading of activation happens in several directions.
notes (corresponding to a single learned physical gesture, a In a top-down fashion, goals spread to those mechanisms
motor “riff”) according to the lowest level agents’ that realize them, such as the scripts for how to behave over
descriptions (a pitch and rhythm texture). As each motor the course of a solo and the treatment of conventions. In
routine is concluded, these agents are polled again to addition, the agent for each concept has actuators which
deliver activation to the agents which realize its “groove”, the type of interaction between two soloists, etc.
preconditions and to those which represent the its sub- These questions can be better pursued on top of a framework
components. For example, the appoggiatura concept of common sense knowledge about music. To program this
activates the rhythm agent and the pitch agent in synchrony knowledge is a difficult task. Towards this end, I have
to place the expected goal tone on a less stressed pulse created an architecture based on ideas from Minsky and
following a coloration tone in the stressed position. Maes, and a specific model of a task in a restricted genre.
Bottom-up spreading occurs from those figures which are The proposed computational model of improvisation
selected to be played to their associated successors. This is a reflects my approach to building intelligent music systems.
specific case of the general spreading of activation from lines The use of multiple representations, consisting of K-lines
active in the song-environment. A figure active in the activating agents, combined with a spreading activation
environment spreads activation to its associated concepts’ network, allows for an interesting model of the chaining of
actuators at those moments when it can begin (with respect musical associations in the mind of a performer. As
to the position in the solo, the chord changes, the meter, different aspects of what is being played (the focus of
etc.). For example, a likely figure for a turnaround or the different agents) are given more attention, the active agents
concept of starting a solo with two similar 4-bar phrases are can emphasize and expand this type of information in a
each activated at particular moments. If a figure has been melodic line. The spreading activation network can model
started (that is, if its initial concepts correspond to those both a temporary obsession with a particular melodic idea,
chosen to be played), it remains active and activates the rest as well as its eventual fatigue. The desired result is that an
of its sequence of concepts until they are not chosen by the improvisation can develop a sense of direction over time
system. Thus, there is spreading activation from the context based upon the stringing together of fragments of the
of the song (serving as the environment): it both suggests representations of active musical materials. The limitations
figures and concepts that can occur at a particular location of the current model include the absence of an ability to
from the set of figures active in the song-environment, and learn and the restriction to a specific genre. Further
also works on a low level by urging rhythmic actions (e.g., evaluation of the model and the general approach awaits
spreading activation to the agent that seeks to play rhythms completion of the implementation.
in phase with the meter) and pitch choices corresponding to
the metrical and harmonic context. References
This use of spreading activation is essentially a way to
combine the interests of different musical concepts that the Agre and Chapman (1988). “What are Plans for?”. A.I.
system wants to realize at a given point. However, it also Memo #1050. MIT.
does allow for simple emergent planning behavior [Maes Amadie, J. (1990). Jazz Improv. Thornton Publications,
89]; for example, the concept of doing a leap requires that Pennsylvania.
the leap start from a chord tone and thus spreads activation Bharucha, J. (1994). “Tonality and Expectation”, in
to the pitch agent to effect this. However, the most Musical Perceptions. Oxford University Press, New York.
important feature of the use of a spreading activation
network here is its capacity to model the phenomenon of Clynes, M. (1978). Sentics: The Touch of Emotions.
chaining musical associations and ideas in the mind of a Anchor Press, Garden City.
musician. This allows exploration of high-level ideas about Desain, P. and Honing, H. (1992). Music, Mind, and
how a solo evolves over time, maintains a sense of Machine: Studies in Computer Music, Music Cognition,
direction, or follows a dramatic curve of intensity. By and Artificial Intelligence. Thesis Publishers, Amsterdam.
continually referring to what it has done, a system can be
Dowling, W. (1994). “Melodic Contour in Hearing and
opportunistic in its composition, cascading local processes
Remembering Melodies”, in Musical Perceptions. Oxford
to create larger forms.
University Press, New York.
4 Conclusion Horowitz, D. (1994). “A Hybrid Approach to Scaling
Action Selection”. Unpublished paper, MIT Media Lab.
This paper proposes an architecture for the representation of
common sense concepts about musical improvisation, of the Johnson-Laird (1991). “Jazz Improvisation: A Theory at the
structures we notice when listening and intend to create Computational Level”, in Representing Musical Structure.
when playing. The representation models how these Academic, London.
concepts may be recognized and produced. Each concept is Krumhansl, C. (1990). Cognitive Foundations of Musical
embedded in a network of other musical concepts, so that the Pitch. Oxford University Press, New York.
meaning and consequences of concepts are defined by their
effect upon other agents, the mental state, etc. I suggest Lerdahl, F. and Jackendoff, R. (1983). A Generative
that it is through such explicit enumeration and Theory of Tonal Music. MIT Press, Cambridge.
quantification of musical concepts, which can represent Levitt, D. (1981). A Melody Description System for Jazz
different and conflicting perspectives, that progress can be Improvisation. Masters Thesis, MIT, Cambridge.
made towards understanding musical intelligence. Lidov, D. (1975). On Musical Phrase. Groupe de
I share Widmer’s acknowledgment of the knowledge Recherches en Semiologie Musicale, Universite de
intensive nature of questions about musical understanding Montreal.
concerning performance expression, and believe this also
applies to beat-tracking, the nature of swing or of a
Maes, P. (1989). “How to do the Right Thing”, in
Connection Science Journal Vol. 1, No. 3.
Maes, P. (1994). “Modeling Adaptive Autonomous
Agents”, in Artifical Life Journal Vol. 1, No. 1 and 2. MIT
Press, Cambridge.
Mehegan, J. (1959-1965). Jazz Improvisation, in four
volumes. Watson-Guptill Publications, New York.
Meyer, L. (1956). Emotion and Meaning in Music.
University of Chicago Press, Chicago.
Minsky, M. (1986). The Society of Mind. Simon and
Schuster, New York.
Minsky, M. (1989). “Music, Mind, and Meaning”, in The
Music Machine. MIT Press, Cambridge.
Narmour, E. (1990). The Analysis and Cognition of Basic
Melodic Structures: The Implication-Realization Model.
University of Chicago Press,
Pennycook et al. (1993). “Toward a Computer Model of a
Jazz Improvisor”, in International Computer Music
Conference 1993 Proceedings, ICMA, California.
Pressing, J. (1985). “Experimental Research into Musical
Generative Ability”, in Generative Processes in Music.
Oxford University Press, New York.
Ramalho, G. and Ganascia, J. (1994). “Simulating
Creativity in Jazz Performance”, in Proceedings of the
Twelfth National Conference on Artificial Intelligence,
Seattle, WA.
Riecken, R. (1992). “Wolfgang: A System Using Emotion
Potentials to Manage Musical Design”, in Understanding
Music with AI. MIT Press, Cambridge.
Rowe, R. (1993). Interactive Music Systems. MIT Press,
Cambridge.
Schank, R and Abelson, R. (1977). Scripts, Plans, Goals,
and Understanding. Erlbaum, Hillsdale, New Jersey.
Schuller, Gunther (1968). Early Jazz: Its Roots and
Musical Development. Oxford University Press, New York.
Sudnow, D. (1978). Ways of the Hand: The Organization
of Improvised Conduct. Harper and Row, New York.
Widmer, G. (1992). “A Knowledge Intensive Approach to
Machine Learning in Tonal Music”, in Understanding Music
with AI. MIT Press, Cambridge.