Babaisyou Generation
Babaisyou Generation
Babaisyou Generation
Abstract—We present a collaborative mixed-initiative system levels for the puzzle game Baba is You (Arvi Teikari, 2019).
for building levels for the puzzle game “Baba is You”. Unlike The system includes features for editing levels, automatically
previous mixed-initiative systems, Baba is Y’all is designed for playtesting levels, helping design levels through an evolution-
collaborative asynchronous creation by multiple users over the
internet. The system includes several AI-assisted features to help ary algorithm, rating levels, and suggesting novel levels to
designers, including a level evolver and an automated player design. All features are built around a central level archive
for playtesting. The level archives catalogues levels according to which is structured like the map of elites from the MAP-Elites
which mechanics are implemented and not implemented, allowing algorithm. We also report preliminary results from an informal
the system to ask users to design levels with specific combinations user study, shining some light on how the system can be used.
of mechanics. We describe the operation of the system and
the results of small-scale informal user test, and discuss future II. BACKGROUND
development paths for this system as well as for collaborative
mixed-initiative systems in general. A. Procedural Content Generation
Index Terms—PCG, Level Generation, Mixed-Inititive, Evolu- Procedural Content Generation is the process of using a
tionary Computation, Quality Diversity
computer program to create content [4]. These techniques have
I. I NTRODUCTION been used since the early days of computer games and still
remain a popular technique today. PCG is typically divided
How to best design game content together with content gen-
based on the technology behind the generation process into
eration algorithms is a hard and important question. A number
three main categories: Constructive techniques [4], Search-
of prototype systems for mixed-initiative design have been
Based techniques [5], and Machine Learning techniques [6].
created to showcase ways in which humans and algorithms can
Search based approaches are more common in academia for
design game content together [1, 2, 3]. Many different modes
their generality and ease of use. These techniques use a
of interaction have been devised, including those where the
search/optimization algorithm to search the space of potential
computer program provides suggestions to the human designer,
content for good artifacts. A fitness function is defined in order
evaluates their output, tests for playability, etc. However, all
to guide the search. Previously, researchers have used it to
of these systems for AI-assisted game content generation are
generate decorations such as flowers [7], game mechanics such
geared towards a single user.
as bullet patterns [8], or entire game levels [2, 9], etc.
In this paper, we address the challenge of AI-assisted
With the advancement in quality-diversity search based
collaborative game content creation, that is, where multiple
methods [10], game researchers have started to focus on using
users interact with a procedural content generation system
it in more projects [11]. Quality-diversity techniques are search
to create game content. Similar to an open source system
based techniques that try to generate a set of diverse solutions
like Wikipedia, there would be a central content repository,
with a high quality. A well-known example is Map-Elites [12],
where anyone could make a contribution to the content.
an evolutionary algorithm that uses a multi-dimensional map
But additionally, the system should help users create content
instead of a population to maintain its results. This map is
through various AI functionalities, such as providing sugges-
constructed by dividing the solution space into a group of cells
tions, testing, and feedback. Most importantly, the initiative for
based on defined behavior characterstics. Any new solution
design should be mixed. For example, the system might ask
found will have to evaluated for its location in the map and
users to design specific types of content that it thinks should
placed in the correct cell. In most situations, solutions in
be designed, or to test or evaluate artifacts that others might
the same cell compete within the population and only the
have designed.
In this paper, we describe a prototype system for col- fittest individual survives. Because of the map maintaince
laborative mixed-initiative level design, where users design and the cell competition, Map-Elites can guarantee a map of
diverse and high quality solutions, after a finite number of
iterations. These properties of Map-Elites have been used in
978-1-5386-5541-2/18/$31.00 ©2018 IEEE other research topics throughout academia [9, 13, 14, 15, 16].
B. AI Mixed Initiative in Games
AI mixed initiative process is human and AI collaboration
together on a creative task. This task can include painting [17,
18], designing shoes [19], composing music [20], etc. In this
paper, we are focusing only on collaboration in the context
of level design. Researchers have experimented with different
approaches to find the best way of collaboration between
human and machines. One of the earliest approaches involved
using search based evolutionary algorithms to suggest content (a) ‘Flag is Win’ is satisfied (b) ‘Flag is Win’ is not sat-
edits for the user based on a specific fitness function [1, 3, 21]. at the start isfied at the start
A drawback of using evolutionary algorithms is that it doesn’t
provide much diversity between the suggestions; which can Fig. 1: Simple ‘Baba is You’ levels with different satisfied
become very limiting during collaboration if the suggestion starting rules.
is not ideal. Similarly, machine learning methods suffer from
a similar problem, as the trained model usually suggests one movements of pushing word blocks around the map. Since
solution. The advantage of using machine learning techniques the mechanics act as constraints and solutions for the player,
is the ability to modify models using active learning to cater the game provides an interesting study for procedural level
more to the user’s personality and style [22, 23]. generation where game levels and game rules are intertwined.
Liapis et al. has tried to solve this problem by introducing Every ‘Baba is You’ level must have a win condition and a
several different suggestions that improve different fitness controllable character that can reach the defined win condition
functions during the design of 2D strategy maps [2]. Unfortu- in order to successfully solve the level. These two rules are
nately, the system was evaluating multiple different functions defined by ‘X-IS-WIN’ and ‘Y-IS-YOU’ respectively - where
making it slow and hard to scale. Alvarez et al. started using X and Y are any type of game objects that can exist in the
constrained Map-Elites [9] towards designing a collaborative game level. Figure 1 shows two simple levels from ‘Baba is
tool to designing top down dungeons [13, 14]. You’. Figure 1a shows a simple level where the player controls
C. Open source projects the ‘Baba’ object (with the ‘Baba-IS-YOU’ rule is currently
active) and tries to reach the ‘Flag’ object to win the level
Open source collaborative projects are projects that allow
(with the ‘Flag-IS-WIN’ rule is active at the start of the level).
users to contribute specific content to create the improvement
The rules do not have to be initially made at the start of the
of a system. The loose structure of open source projects can
game and can be manipulated at any point while the player is
discourage users from engaging with it. As such, they only
attempting to solve the level. Figure 1b shows the same level
join when necessary and have time. Brito et al. suggests
as figure 1a but the rule ‘Flag-IS-WIN’ is not satisfied at the
gamifing the process in order to have more engagement with
beginning. The player will need to push the word ‘Flag’ to
the sytem and provided a conceptual framework on how to do
complete the sentence ’Flag-IS-WIN’ - thereby activating that
so in crowd sourcing projects [24]. Several research projects
rule so they can win the level.
gamified the process in order to guarantee user participation
Rules for the level are defined by statements readable as
and collect more data [25, 26, 27].
English text and read from up-down and left-right. So ’X-IS-
In games, this is not the case; most games are closed source.
YOU’ would be interpreted as a valid rule, but not ’YOU-IS-
Developers can invest thousands of collective hours developing
X.’ As such, all rules must take one of these 3 formats:
a game over the course of multiple years without sharing the
• X-IS-(KEYWORD) where KEYWORD belongs to a
data. Very few commercial developers allow users to build
their own content and share ideas through their system such word in the keyword word class that manipulates the
as Super Mario Maker (Nintendo, 2015), Skyrim (Bethesda, property of the game object class X (i.e. ’WIN’, ’YOU’,
2011), Time Fcuk (Edmund McMillen, 2009), etc. From a ’MOVE’, etc.)
• X-IS-X a reflexive rule stating that the game object class
research perspective, very few projects use open sourcing to
generate more game content. Barros et al.’s work [28] is known ’X’ cannot be changed to another game object class.
• X-IS-Y a transformative rule changing all game objects
for using crowd-sourced data such as Wikipedia to create a
point-click adventure game similar to the game ”Where in the of class X into game objects of class Y.
World is Carmen Sandiego?” (Broderbund Software, 1985). The objects themselves ultimately do not matter in the game.
The project was able to benefit from the large corpus of open- For example, a player can control Baba, Keke, walls, or any
data to generate interesting adventure games. other object or multiple objects at once in the game so long as
it can be represented by a physical object(s) and the rule ’X-
D. Baba is You IS-YOU’ is connected where X represents the physical object
Baba is You (Arvi “Hempuli” Teikari, 2019) is a puzzle that the player is currently controlling. This applies to all
game where players can manipulate the rules of a level rules created initially before the player starts to solve the level
and properties of the game objects through Sokoban-like and any rules that are created or removed while the player is
solving the level. Game object classes that are affected by any
rule have their properties updated accordingly. For example,
creating the rule ’KEKE-IS-MOVE’ within the game would
make all KEKE objects start autonomously moving within
the game after each action is taken thereafter. If the rule is
removed after a future action is taken, all KEKE objects would
stop moving subsequently.
Fig. 2: Example of an ascii representation of a level to a render
III. BABA IS Y’ ALL of the level.
1
Baba is Y’all is an Open Source Mixed initiative tool
that allows users to build their own Baba is You levels and
save them online. The base functionality of editing, saving • Map Module: responsible for maintaining the different
and playing others’ levels is similar to Super Mario Maker game levels and suggests new levels to play or make.
(Nintendo, 2015), though of course not as advanced. The These modules communicate with each other based on the
system implements the jam version 2 of Baba is You and not sequence of actions the user makes while navigating the site.
the more full-fledged commercial version as the jam version
has enough variety in game mechanics that allows the users A. Game Module
to build different levels. However, Baba is Y’all is not just The game module is responsible for simulating a Baba is
an ordinary level editor that saves the levels online. The You level. It also allows users to test the playability of levels
level archive is structured as a quality-diversity evolutionary either by directly playing through the level themselves or by
algorithm [10], specifically MAP-Elites [12]. The MAP-Elites allowing a solver agent to attempt to solve it. This component
algorithm will allow us to save, organize, retrive, suggest levels is used in the Editor module, the Generator module, and the
easily as they can be stored in a MAP cell corresponding to rating page. After a level has been saved to the database, users
their Behavior Characteristics. can attempt to solve the level in fewer moves than what was
The overall process of users editing levels and building originally solved by the creator.
on each others’ level can be seen as an instantiation of this The module receives an input level in the form of a 2D
algorithm, which seeks to improve the overall quality of the array of characters where each character represents a different
saved levels while maintaining the diversity of the levels. game sprite. Figure 2 shows an example input ascii level and
Every time a user submits a new level, the system saves it its corresponding game level with different game sprites. Game
with mechanically similar levels (levels that use similar rules sprites are divided into two main different classes: object class
in their solutions) where they will be rated with respect to and keyword class. Object class are the actual game object
each other. Later, when a user is trying to play a new level that game rules manupilate and they are the sprites that can
that is mechanically different (levels that need the player to use modify the game state, while keyword class defines the actual
different rules to solved them) from what they have previously rules of the level. For example, figure 2 shows five different
play, the system can provide a different level with sufficient keyword class sprites (BABA, IS, YOU, FLAG, and WIN).
quality. The system also pushes the users towards designing These sprites are arranged in two rules: ‘BABA IS YOU’
levels that lack a specific mechanic so players can always allowing the player to control all the Baba objects and ‘FLAG
find something new and interesting to play. Lastly, the system IS WIN’ indicating that reaching any flag object will make
provides a simple evolutionary algorithm as part of the creation the player win the level. The system has a total of 32 different
process to help inspire designers. sprites: 11 object classes and 21 keyword classes.
To make sure the system is simple and easy to access by Because the game allows rule manipulation, object classes
everyone, we decided to design it as a web application where mean little in the game except for providing variety of objects
anyone with a broswer can access it. We modularized the for rules to affect and/or aesthetic pleasure. Because the game
system into four main modules, to make it easy to maintain rules are always changing, the system keeps track at every
and update at any time. state of all the active rules. Once the win condition has been
• Game Module: the core game framework where the user
met, the game module records the current solution, the active
can test game levels by solving it themselves or by using rules at the start of the level, and the active rules when the
an AI solver. solution has been reached. These properties are saved to be
• Editor Module: the level editor where the user can
used and interpreted by the Map module. The rules activated
design Baba is You levels and submit it to our map are used as the level’s characteristic feature representation and
module. saved as a chromosome to the MAP-Elites matrix.
• Generator Module: provides the user with access to an
The game module provides an AI solver called ’KEKE’
evolutionary algorithm that can help them generate or (based on one of the characters traditionally used as an
modify levels. autonomous ’NPC’ in the game). KEKE uses a best-first
1 http://game.engineering.nyu.edu/babaisyall/ tree search algorithm that tries to solve the input level. The
2 https://hempuli.itch.io/baba-is-you branching space is based on the five possible inputs a player
Fig. 3: A screenshot of the level editor page Fig. 4: A screenshot of the level evolver page
can do within the game: move left, move right, move up, move Our editor also nudges the player towards creating levels
down, and do nothing. The algorithm uses a heuristic function with a certain mechanical goals (as seen in figure 3). The
based on a weighted average of the Manhattan distance to mechanical goals are defined based on the presence of certain
the centroid distance for 3 different groups: keyword objects, active rules in the game either at the start state or at the
objects associated with the ’WIN’ rule, and objects associated winning state of the current level. The mechanical goals are
with the ’PUSH’ rule. These were chosen based on their written in form X-IS-KEYWORD such as X-IS-MOVE or X-
critical importance for the user solving the level - as winning IS-STOP. The X is never specified since any object in Baba is
objects are required to complete the level, keyword objects You can be used for anything while the KEYWORD is pre-
allow for manipulation of active rules, and pushable objects defined. A list of all the different goals will be discussed later
can directly and indirectly affect the layout of a level map and in section III-D. Any combination of goals can be represented
therefore the accessability of player objects to reach winning in the level - and unmade combinations will be presented to
objects. The heuristic function is represented by the following the user in order of simplicity. Through this process, the user
equation: and the system help each other to develop a large and diverse
h = (n + w + p)/3 (1) level pool. The user provides the system with novel levels,
while the system guides the user by telling them what levels
where h is the final heuristic value for placement in the priority are not existing in the current pool to encourage originality
queue, n is the minimum Manhatttan distance from any player and diversity. At any point, if the designer struggles with a
object to the nearest winnable object, w is the minimum concept while designing a level, they can transfer their current
Manhatttan distance from any player object to the nearest word level to the Generator module to automatically modify the
sprite, and p is the minimum Manhatttan distance from any level for them. After mutating and evolving their level through
player object to the nearest pushable object. the Generator, they can transfer their level back to the Editor
Since the system runs the solver on the user-end of the site, module to do the final edits manually before submitting it to
the solver agent has a limit of 10000 search steps to avoid the Map module.
computational strain to the user’s machine. As such, the agent
C. Generator Module
is unable to solve levels with long or complex solutions due to
its limited search space. The average limit for solution steps The Generator module is a procedural content level gener-
for KEKE has not been tested yet since the focus of this project ator. In particularly, it is an evolutionary level generator using
was to allow mixed-initiative collaboration with level design. a fitness function based on a version of tile-pattern Kullback-
Liebler Divergence (TPKLDiv3 ) algorithm [29]. Figure 4
B. Editor Module shows the current interface used by the evolver. As mentioned
before, the Evolver module can be accessed from the Editor
The editor module of the system allows human users to module at any time and can export its output back to the Editor
create their own Baba is You levels in the same vain of Super module. With this transfer process, the generation process
Mario Maker (Nintendo, 2015). Figure 3 shows the editor loses its pure procedurally generated quality. However, this
window that is available for the user. The user can place process encourages mixed-initiative interaction between the
and erase any game sprite or keyword at any location on the algorithm and the user. This also creates an ”algorithm-first”
map using the provided tools. As a basis, the user can start approach to creating levels, where the levels are generated
modifying either a blank map, a basic map (a map with X- towards a fitness value and then edited by the user later.
IS-YOU and Y-IS-WIN rules already placed with X and Y The evolver interface provides the user with multiple cus-
objects), or an elite level provided by the Map Module. Similar tomizations such as the initialization method, stopping criteria,
to Mario Maker, the created levels can only be submitted after evolution pausing, and an application of a mutation function
they are tested by the human player or the AI agent to check allowing manual user control. With these features, the user is
for solvability. For testing the level, the editor module sends not directly changing the evolution process itself, but instead
the level information to the game module to allow the user to
test it. 3 https://github.com/amidos2006/ETPKLDiv
guiding and limiting the algorithm towards generating the level TABLE I: Chromosome Rule Representation
they want. Rule Type Definition
The original ETPKLDiv algorithm uses a 1+1 evolution X-IS-X objects of class X cannot be changed to another class
strategy, a.k.a. a hillclimber, to improve the similarity between X-IS-Y objects of class X will transform to class Y
X-IS-PUSH X can be pushed
the current evolved chromosomes and a reference level. The X-IS-MOVE X will autonomously move
algorithm uses a sliding window of a fixed size to calculate X-IS-STOP X will prevent the player from passing through it
the probability of each tile configuration (called tile patterns) X-IS-KILL X will kill the player on contact
X-IS-SINK X will destroy any object on contact
in both the reference level and the evolved level and tries X-IS-[PAIR] both rules ’X-IS-HOT’ and ’X-IS-MELT’ are present
to minimize the Kullback-Liebler Divergence between them. X,Y-IS-YOU two distinct objects classes are controlled by the player
For this project, we used a window size of 3x3. This was to
maximize the probability of generating initial rules for a level,
since rules in Baba is You are made up of 3 tiles. The value s is the ratio of empty space tiles to all of the
In our project, we used 2+2 evolution strategy instead of 1+1 tiles in the level. The equation can be calculated as follows:
used by Lucas and Volz to allow slightly more diversity in the e
s= (5)
population. We also modified the fitness function to allow it t
to compare with more than one level. The fitness value also where e is the number of empty spaces in the level and t is
includes the playability of the level (p), the ratio of empty tiles the total number of tiles found in the level. The value s is
to total level tiles (s), and the ratio of unnecessary objects (u). multiplied with a value of 0.1 to avoid heavy penalization for
The final fitness equation for a level is as follows: having any empty spaces in a level and to prevent encourage-
ment for levels to mutate towards populating the level with an
f itnessnew = min(f itnessold ) + u + p + 0.1 · s (2) overabundance of similar tiles in order to eliminate any empty
where f itnessold is the Kullback-Lievler Divergence fitness space. This instead causes the algorithm to favor smaller and
function from Lucas and Volz’s work [29] compared to the more compact levels over levels with a large area.
most similar reference levels. The Generator module is not run as a back-end process
The value u is the percentage of unnecessary objects in to find more levels, instead it has to be done manually by
the map; objects that are not required or predicted to act as a the user. This is done due to the fact that some generated
constraint or solution for the level. The value can be calculated levels cannot be solved without human input. One might
as follow: wonder why not generate a huge corpus of levels and ask
o the users later to test them for the system. This could result
u= (3)
a in the system generating a multitude of levels that are either
where o is the number of objects sprites initialized in the impossible to solve or are solvable but not subjectively ”good”
level without a related object-word sprite and a is the total levels - levels the user would not find pleasing or enjoyable.
number of object sprites initialized in the level. This value This overabundance of ”garbage” levels could lead to a waste
is implemented in order to prevent noise within the level of memory and a waste human resources. By allowing the
due to having object tiles that cannot be manipulated in any user direct control over which levels are submitted from the
way or have relevancy to the level. A human-made level may generation algorithm, it still guarantees that the levels are
include these ”useless” tiles for aesthetic purposes or to give solable and with sufficient quality and promote using the tool
the level a theme - similar to the original ’Baba is You’ levels. in a mixed-initiative approach.
However, the PCG algorithm optimizes towards efficiency and D. Map module
minimalist levels, therefore ignoring the subjective aspect of
The Map module is the core module of the system. It
a level’s quality (which can be added later by the user).
provides three main functionalities: level archiving, suggesting
The value p is a binary constraint value that determines
levels to create, and suggesting levels to be played and rated. In
whether a level is potentially winnable or not. The value can
order to achieve all of these functionalities, we need an optimal
be calculated as follows:
( way of representing the relationship between the created levels.
1, has [’X-IS-YOU’ rule, ’WIN’ keyword] A way to measure similarity between levels so we can find
p= (4) the best levels between them and easily discover new levels
0, otherwise
every time. We ended up using the same structure as the Map-
This is to ensure any levels that are absolutely impossible to Elites algorithm [12] in order to guarantee quality and diversity
play or win are penalized in the population and less likely to between the levels.
be mutated and evolved from in future generations. We used When a level is submitted to be archived, the system uses
a simple playability constraint check instead of checking for the the list of active rules at the start and the end of the level
playability using the solver because the solver take time to as behavior characteristic for the input level to determine its
check for playability. Also all playable levels by the solver location in the map. There are 9 different rules checked for in
usually end up being easy levels due to the limited search each level - based on the possible rule mechanics that can be
space we are given for the best first algorithm. made in the Game module system. Table I shows the full list of
Fig. 5: A screenshot of the level matrix page