Metacompose: A Compositional Evolutionary Music Composer: March 2016
Metacompose: A Compositional Evolutionary Music Composer: March 2016
Metacompose: A Compositional Evolutionary Music Composer: March 2016
net/publication/301820335
CITATIONS READS
40 6,292
4 authors, including:
Some of the authors of this publication are also working on these related projects:
All content following this page was uploaded by Peter W. Eklund on 27 November 2017.
1 Introduction
Computer music generation is an active research field encompassing a wide range
of approaches [1]. There are many reasons for wanting to build a computer system
that can competently generate music. One is that music has the power to evoke
moods and emotions – even music generated algorithmically [2]. In some cases the
main purpose of a music generation algorithm is to evoke a particular mood. This
is true for music generators that form part of highly interactive systems, such
as those supporting computer games. In such systems a common goal of music
generation is to elicit a particular mood that dynamically suits the current state
of the game play. Music generation for computer games can be seen in the content
of an experience-driven procedural content generation framework (EDPCG) [3],
where the game adaptation mechanism generates music with a particular mood
or affect expression in response to player actions.
While the medium-term goal of our system (described in Section 3) focuses
on this kind of affect expression, this paper describes work on the core aspects
of music generation, without expressly considering affective impact.
In games, unlike in traditional sequential media such as novels or movies,
events unfold in response to player input. Therefore, a music composer for an
interactive environment needs to create music that is dynamic while also being
non-repetitive. This applies to a wide range of games but not all of them; for
example rhythm games make use of semi-static music around which the gameplay
is constructed. The central research question in this paper is how to create music
that is dynamic and responsive in real-time, maintaining fundamental musical
characteristics such as harmony, rhythm, etc. We describe a component-based
framework for (i) the generation of a musical abstraction and (ii) real-time music
creation through improvisation. Apart from the description of the method the
main research question of this paper is validating the music generation
algorithm: do all the components of the system add to the music
generated? To that end we present and discuss the results of a participant-
based evaluation study.
2 Background
Procedural generation of music is a field that has received much attention over
the last decade. The approaches taken are diverse, they range from creating sim-
ple sound effects, to avoiding repetition when playing human-authored music, to
creating more complex harmonic and melodic structures [4]. A variety of differ-
ent approaches to procedural music generation have been developed, which can
be divided into: transformational and generative algorithms [5]. MetaCompose
falls in the latter category as it creates music without having any predefined
snippets to modify or recombine.
Similar work to ours can be found in the system described by Robertson
[6], which focuses on expressing fear. There are some parallels with this work,
as the representation of the musical data through an abstraction (in their case
the CHARM representation [7]), yet we claim our system has a higher affective
expressiveness since it aims to express multiple moods via music. There are many
examples of using evolutionary approaches to generating music, two example are
the work of Loughran et al [8] and Dahlstedt’s evolution of piano pieces [9], many
more can be found in the Evolutionary Computer Music book [10].
Other examples of real-time music generation can be found in some patents:
a system that allows the user to play a solo over some generative music [11] and a
system that can create complete concerts in real time [12]. An interesting parallel
between the second system and ours is the incorporation of measures of “dis-
tance” between music snippets to reduce repetition. Still, both these approaches
present no explicit affective expression techniques.
The work of Livingstone [13] in trying to define a dynamic music environment
where music tracks adjust in real-time to the emotions of the game character
(or game state). While this work is interesting, it is still limited (in our opin-
ion), by the usage of predefined music tracks for affective expression. Finally
Mezzo[14], a system designed by Daniel Brown that composes neo-Romantic
game soundtracks in real time, creates music that adapts to emotional states of
the character, mainly through the manipulation of leitmotifs.
2.2 Multi-Objective Optimization
Many search/optimization problems have not only one or several numerical ob-
jectives, but also a number of constraints – binary conditions that need to be
satisfied for a solution to be valid. The approach we adopted for melody gen-
eration contains such strong rules, that are described in detail in Section 5.2.
A number of constraint handling techniques have been developed to deal with
such cases within evolutionary algorithms. FI-2POP [17] is a constrained evolu-
tionary algorithm that keeps two populations evolving in parallel, where feasible
solutions are selected and bred to improve their objective function values while
infeasible solutions are selected and bred to reduce their constraint violations.
Each generation, individuals are tested to see if they violate the constraints; if
so they are moved to the ’Infeasible’ population, otherwise they are moved to
the ’Feasible’ one. An interesting feature of this algorithm is that the infeasible
population influences, and sometimes dominates, the genetic material of the op-
timal solution. Since the infeasible population is not evaluated by the objective
function it cannot get stuck in a sub-optimal solution, but it is free to explore
boundary regions, where the optimum is most likely to be found.
3
http://www.cs.cinvestav.mx/˜constraint/papers/
Fig. 1. Steps for generating a composition.
3 MetaCompose
structure of FI-2POP, but the objective function of the feasible function is sub-
stituted with the NSGA-II algorithm. In section 5.2 is described an application
of this approach to the evolution of melodies.
5 Composition Generation
Composition in this paper is a chord sequence, a melody and an accompani-
ment. It is worth noting that the accompaniment is only an abstraction and not
a complete score of a possible accompaniment, which is described in detail in
Section 5.3 below. The main reason for the deconstruction of compositions is
that we want a general structure (an abstraction) that makes music recogniz-
able and gives it identity. Generating abstractions, which themselves lack some
information that one would include in a classically composed piece of music, e.g.
tempo, dynamics, etc, allows the system to modify the music played in real-time
depending on the affective state the game-play wishes to convey. The generation
of compositions is a process with multiple steps: (i) create a chord sequence, (ii)
evolve a melody fitting this chord sequence, and (iii) create an accompaniment
for the melody/chord sequence combination (see Fig. 1).
Constraints We have three constraints: a melody should: (i) not have leaps
between notes bigger than a fifth, (ii) contain at least a minimum amount of
leaps of a second (50% in the current implementation) and (iii) each note pitch
should be different than the preceding one.
n−1
X
Feasibleness = − (Second(i, i + 1) + BigLeap(i, i + 1) + Repeat(i, i + 1))
i=0
where n is the genome length (1)
The three functions comprising equation 1 are all boolean functions that return
either 1 or 0 depending if the two notes at the specified indexes of the genome
satisfy the constraint or not. As can be seen, this function returns a number that
ranges from (potentially) −∞ to 0, where reaching the 0 score determines that
the individual satisfies all the constraints and, consequently, can be moved from
the unfeasible population to the feasible one.
On the constraints in eqn 1, leaps larger than a fifth do appear in music but
they are avoided here, as experience suggest they are too hard on the ear of the
listener. Namely, if the listener is not properly prepared, leaps larger than a fifth
can easily break the flow of the melody. We also specify a minimum number
of intervals of a second (the smallest interval possible considering a diatonic
context such as this one, see the Genome Representation section) because if the
melody has too many larger leaps it feels more unstructured, and not something
that we would normally hear or expect a voice to sing. Finally the constraint on
repetition of notes is justified by the fact that these will be introduced in the
real-time interpretation of the abstraction.
CounterStep =
Pn−1
i=0 [IsLeap(i, i + 1)(PreCounterStep(i, i + 1) + PostCounterStep(i, i + 1))]
leapsN
(2)
ChordOnLeap =
Pn−1
i=0 [IsLeap(i, i + 1)(BelongsToChord(i) + BelongsToChord(i + 1))]
(3)
leapsN
Pn
i=0 (IsFirstNote(i) × BelongsToChord(i))
FirstNoteOnChord = (4)
chordsN
First, we remind the reader that the definition of an interval in music theory is
the difference between two pitches. In Western music, intervals are mostly
differences between notes belonging to the diatonic scale (for example, consid-
ering a Cmajor key, the interval between C and D is a second, the interval
between C and E is a third and so on).
To clarify what counter step-wise motion means: if we examine a leap of
a fifth from C to G as in Fig. 3 (assuming we are in a Cmajor key), this is
a upward movement from a lower note to a higher one, a counter step-wise
approach would mean that the C would be preceded by a higher note (creating
a downward movement) with an interval of a second, so a D. Likewise following
Fig. 3. Example of counter step-wise approach and departure from a leap (C-G).
the leap in a counter step-wise motion would mean that we need to create a
downward movement of a second after the G, so we need an F to follow. The
reason we introduce this objective is that this simple technique makes leaps much
easier on the listener’s ear, otherwise they often sound too abrupt by suddenly
changing the range of notes the melody is playing. The PreCounterStep and
PostCounterStep functions are boolean functions that respectively check if the
note preceding and following the leap approaches or departs with a contrary
interval of a second.
The reason for having the leap notes – the two notes that form a leap bigger
than a second – be part of the underlying chord is that leaps are intrinsically more
interesting than a step-wise motion, this means that the listener unconsciously
considers them more meaningful and pays more attention to them. If these leaps
contain notes that have nothing to do with the underlying chord, even if they do
not present real dissonances, they will be perceived as dissonant because they
create unexpected intervals with the chord notes. Trying to include them as part
of the chord gives a better sense of coherence that the listener will consider as
pleasant. The last objective simply underlines the importance of the first note
after a chord change, by playing a note that is part of the chord we reinforce the
change and make it sound less discordant.
Note that these objectives, by the nature of multi-objective optimization,
will generally not all be completely satisfied. This is fine, because satisfying all
objectives might make the generated music sound too mechanical, while these
are “soft” rules that we want to enforce only to a certain point (contrary to the
constraints of the infeasible population, which always need to be satisfied).
The rhythm presented in the final music will be modified by the real-time af-
fective music composer for variety or for affect expression, while still maintaining
a rhythmic and harmonic identity that will be characteristic of the composition.
6 Experiment design
Where the last word (e.g. “pleasing”) is dependent on the criteria. We also
include the more neutral answers “Neither” and “Both Equally” to avoid ran-
domness in the data from participants who cannot decide which clip satisfies
the evaluation criteria better or worse. Other benefits of doing this are: avoid-
ing the participant getting frustrated and giving us some possibly interesting
information on cases where the pieces are considered equally good/bad.
Note that for the first five questions in the survey instrument the pair-wise
clips always included one clip from the complete generator. After five trials, the
clip pairs are picked at random between all the groups. In this way, we hoped
to collect enough data to be able to make some observations between the four
“broken” generators. The motivation behind this design choice is that our main
question is evaluating the complete generator against all possible alternatives,
so attention to the complete architectural music generator has priority. This
also has a practical justification in the fact that, with the number of groups
we have (five), testing all possible combinations and gather enough data would
be near impossible. The survey has no pre-defined end: the user is able to con-
tinue answering until he/she wants, and can close the online survey at any time
or navigate away from it without data loss. In the preamble, we encouraged
participants to do at least five comparisons.
The five groups were examined (as dictated by the architecture of our generator):
For each of these 5 groups, 10 pieces of music are created. For the sake of
this experiment the affect expression has been kept to a neutral state for all
the groups and we used the same algorithms to improvise on the composition
abstraction. There is therefore no test of the music generators affect generation
but rather of the music quality form the complete architecture compared to the
architectural alternatives. The clips for the various groups can be accessed at
http://msci.itu.dk/evaluationClips/
The data collected amounts to a total of 1,291 answers for each of the four
evaluation criteria from 298 participants. Of the survey trials generated, 1,248
contained a clip generated with our complete music generator algorithm (A).
Table 1 shows how many responses were obtained for each criteria and how
many neutral answers were collected.
Table 1. Amounts of correct, incorrect and neutral answers to our criteria for the
complete generator against all the “broken” generators (B-E) combined. Keep in mind
that in the case of the random criteria the listener is asked to select the clip that he/she
feels the most random, so it is entirely expected that a low number of participants
choose the random clip (E) against the complete generator (A).
For now we only consider definitive answers (the participant chooses one of
the music clips presented), we will look at the impact of the neutral answers at
the end of this section. Under this constraint, the data becomes boolean: the
answers are either “user chooses the clip from the complete generator (A)” or
“user chose the clip from the broken generator (B-E)”. To analyze this data we
use a two-tailed binomial test, which is an exact test of the statistical significance
of deviations from a theoretically expected random distribution of observations
into two categories. The null hypothesis is that both categories are equally likely
to occur and, as we have only two possible outcomes, that probability is 0.5.
Firstly, let us consider the combined results of all the “broken” groups (B-D)
against the complete generator (A): as can be seen from Tab. 1, we have sta-
tistically highly significant differences for the pleasing, random and harmonious
categories, while we have a p-value of 0.078 for the interesting category. This
means that we can refuse the null hypothesis and infer a difference in distribu-
tion between choosing the music generated by the complete algorithm (A) and
the “broken” ones (B-E).
We can affirm that our generator (A) ranked better than all the other ver-
sions (B-E) for three of our four criteria, with the exception of interestingness,
where there is no statistically significant difference. Interestingness is clearly a
very subjective measure, and this may explain the result. Moreover, examining
the ratio of neutral answers obtained for this criteria, it can be inferred that it is
almost 26%, so a much higher neutral response that for the other criteria. This
suggests that in a higher number of cases participants could not say which com-
position they found more interesting. A possible explanation is that, as the affect
expression (which also includes musical features such as tempo and intensity) is
held in a neutral state, equal for all pieces, after hearing a number of clips the lis-
tener does not find much to surprise him/her. Also the duration of the generated
pieces (30 s) might not allow sufficient time to determine interestingness.
Table 2. Answers and results of the binomial test for pairs comprised of the full
generator and the one with random chord sequences.
Table 3. Answers and results of the binomial test for pairs comprised of the full
generator (A) and the one with unconstrained random melody (C).
Table 4. caption
If we only consider the pairs that included the complete generator (A) and the
one with random chord sequences (B) (Tab. 2) we, again, obtain statistically
significant differences in the distribution of the answers for the pleasing, random
and harmonious criteria. In this case we have a very high p-value for interest-
ingness (more than 0.5), in fact we have the same amount of preference for
the complete generator (A) and the “broken” one (B). We can explain this by
considering that the disruptive element introduced by this modification of the
algorithm is mitigated by the fact that the rest of the system tries to create
as pleasing music as it can based on the chord sequence given. So, for most of
the time, the music will not have notes that sound out of key or that do not fit
well with the chord sequence. Still, we can observe how the listener is capable
of identifying that, while the piece does not sound discordant or dissonant, it
lacks the structure of tension-building and tension-releasing. This explains how
the complete generator (A) is preferred for all other criteria. It is interesting to
note how the act itself of presenting the listener with uncommon chord sequences
does create an increase in the interestingness of the music.
Table 6. Answers and results of the binomial test for pairs comprised of the full
generator (A) and the one with random accompaniment (E).
The results given by the constrained random melody generation (D) are more
interesting (Tab. 5). First, we notice no statistically significant values for the
pleasing and interesting criteria. This is explained by the fact that the melody
never goes off key, so it never presents off-key notes and never sounds abruptly
“wrong” to the listener’s ear. Yet, the random and harmonious criteria are sta-
tistically significant. Remembering how we described these criteria we notice
that the more more objective criteria (random and harmonious) are those that
demonstrate a difference in distribution. We believe this reinforces how, although
compositions made in this group never achieve a bad result, the listener is still
able to identify the lack of structure (randomness) and lack of consideration of
the underlying chords of the melody (harmoniousness). An example of the first
case would be a melody that jumps a lot between very different registers; this
would make the melody sound more random than the melodies we evolve using
(A), which follow more closely the guidelines of a singing voice. Harmoniousness
can be influenced by the fact that, over a chord (expressed by the accompani-
ment), the melody can play notes that create intervals that ’muddle’ the clarity
of the chord to the listener’s ear.
Finally, for the last group, the random accompaniment generation (E), gives us
very clear statistically significant results on all criteria (Table 6). A lot of the
harmony expression depends on the accompaniment generation, and when this
is randomized it is no wonder that the piece sounds confusing and discordant.
This is reflected in the trial data.
8 Conclusion and future work
This paper describes a new component-based system for music generation based
on creating an abstraction for musical structure that supports real-time impro-
visation. We focus on the method of creating the abstractions (“compositions”),
that consists of the sequential generation of (i) chord sequences, (ii) melody and
(iii) an accompaniment abstraction. Our novel approach is described to evolve
melody in detail: non-dominated sorting with two feasible-infeasible populations
genetic algorithm (NSFI-2POP).
Returning to the main question of the paper, (do all parts of the music gener-
ation system add to the music produced?We performed an extensive quantitative
study to validate our music generation approach. The main objective is to in-
vestigate the contribution of each component of the framework to the quality
of the music created. To do this, we systematically switched off components of
our generator and replaced them with random generation. From these random
“broken” compositions and the complete algorithm we created various pair-wise
samples to test against each other. (This method was inspired by the “ablation
studies” performed by e.g. Stanley [24]).), we have described an evaluation in
which we created music with our generator substituting various components with
randomized generators. In particular we observed four broken groups: random
chord sequences, random melody constrained (to the key of the piece), random
melody unconstrained and random accompaniment. An evaluation of the music
clips generated by these “broken” versions was compared to music clips created
by the complete algorithm according to four criteria: pleasantness, randomness,
harmoniousness and interestingness.
Analysis of the data supports the assertion that participants prefer the com-
plete system in three of the four criteria: (pleasantness, randomness and harmo-
niousness) to the alternatives offered. The results for the interestingness criteria
are however not definitive, but suggest that some parts of our generator have a
higher impact in this criteria. It is also noteworthy that there is no statistical
significant difference between preferences between constrained melody group (C)
and the complete generator (A) for the pleasantness criteria.
Future work will focus on developing and evaluating the affect-expression
capabilities of the system, we will probably follow the methodology described
by Scirea et al. for characterizing control parameters through crowd-sourcing
[25]. In summary, we show (i) how each part of our music generation method
assists creating music that the listener finds more pleasant and structured and
(ii) presented a novel GA method for constrained multi-objective optimization.
References
1. Papadopoulos, G., Wiggins, G.: Ai methods for algorithmic composition: A survey,
a critical view and future prospects. In: AISB Symposium on Musical Creativity,
Edinburgh, UK (1999) 110–117
2. Konečni, V.J.: Does music induce emotion? a theoretical and methodological anal-
ysis. Psychology of Aesthetics, Creativity, and the Arts 2(2) (2008) 115
3. Yannakakis, G.N., Togelius, J.: Experience-driven procedural content generation.
IEEE Transactions on Affective Computing 2(3) (2011) 147–161
4. Miranda, E.R.: Readings in music and artificial intelligence. Volume 20. Routledge
(2013)
5. Wooller, R., Brown, A.R., Miranda, E., Diederich, J., Berry, R.: A framework for
comparison of process in algorithmic music systems. In: Generative Arts Practice
2005 — A Creativity & Cognition Symposium. (2005)
6. Robertson, J., de Quincey, A., Stapleford, T., Wiggins, G.: Real-time music gener-
ation for a virtual environment. In: Proceedings of ECAI-98 Workshop on AI/Alife
and Entertainment, Citeseer (1998)
7. Smaill, A., Wiggins, G., Harris, M.: Hierarchical music representation for compo-
sition and analysis. Computers and the Humanities 27(1) (1993) 7–17
8. Loughran, R., McDermott, J., O’Neill, M.: Tonality driven piano compositions
with grammatical evolution. In: IEEE Congress on Evolutionary Computation
(CEC), IEEE (2015) 2168–2175
9. Dahlstedt, P.: Autonomous evolution of complete piano pieces and performances.
In: Proceedings of Music AL Workshop, Citeseer (2007)
10. Miranda, E.R., Biles, A.: Evolutionary computer music. Springer (2007)
11. Rigopulos, A.P., Egozy, E.B.: Real-time music creation system (May 6 1997) US
Patent 5,627,335.
12. Meier, S.K., Briggs, J.L.: System for real-time music composition and synthesis
(March 5 1996) US Patent 5,496,962.
13. Livingstone, S.R., Brown, A.R.: Dynamic response: Real-time adaptation for mu-
sic emotion. In: Proceedings of the 2nd Australasian Conference on Interactive
Entertainment. (2005) 105–111
14. Brown, D.: Mezzo: An adaptive, real-time composition program for game sound-
tracks. In: Proceedings of the AIIDE 2012 Workshop on Musical Metacreation.
(2012) 68–72
15. Zitzler, E., Deb, K., Thiele, L.: Comparison of multiobjective evolutionary algo-
rithms: Empirical results. Evolutionary computation 8(2) (2000) 173–195
16. Deb, K., Pratap, A., Agarwal, S., Meyarivan, T.: A fast and elitist multiobjective
genetic algorithm: Nsga-ii. Evolutionary Computation, IEEE Transactions on 6(2)
(2002) 182–197
17. Kimbrough, S.O., Koehler, G.J., Lu, M., Wood, D.H.: On a feasible–infeasible
two-population (fi-2pop) genetic algorithm for constrained optimization: Distance
tracing and no free lunch. Eur. J. Operational Research 190(2) (2008) 310–327
18. Deb, K., Pratap, A., Meyarivan, T.: Constrained test problems for multi-
objective evolutionary optimization. In: Evolutionary Multi-Criterion Optimiza-
tion, Springer (2001) 284–298
19. Chafekar, D., Xuan, J., Rasheed, K.: Constrained multi-objective optimization
using steady state genetic algorithms. In: Genetic and Evolutionary Computa-
tionGECCO 2003, Springer (2003) 813–824
20. Jimenez, F., Gómez-Skarmeta, A.F., Sánchez, G., Deb, K.: An evolutionary algo-
rithm for constrained multi-objective optimization. In: Proceedings of the Congress
on Evolutionary Computation, IEEE (2002) 1133–1138
21. Isaacs, A., Ray, T., Smith, W.: Blessings of maintaining infeasible solutions for
constrained multi-objective optimization problems. In: IEEE Congress on Evolu-
tionary Computation, IEEE (2008) 2780–2787
22. Mugglin, S.: Chord charts and maps. http://mugglinworks.com/chordmaps/chartmaps.htm
Accessed: 2015-09-14.
23. Deb, K.: Multi-objective optimization using evolutionary algorithms. Volume 16.
John Wiley & Sons (2001)
24. Stanley, K.O., Miikkulainen, R.: Evolving neural networks through augmenting
topologies. Evolutionary computation 10(2) (2002) 99–127
25. Scirea, M., Nelson, M.J., Togelius, J.: Moody music generator: Characterising
control parameters using crowdsourcing. In: Evolutionary and Biologically Inspired
Music, Sound, Art and Design. Springer (2015) 200–211