Navigating The Landscape of Multiplayer Games
Navigating The Landscape of Multiplayer Games
Navigating The Landscape of Multiplayer Games
https://doi.org/10.1038/s41467-020-19244-4 OPEN
Multiplayer games have long been used as testbeds in artificial intelligence research, aptly
1234567890():,;
1 DeepMind, Paris, France. 2 DeepMind, London, UK. 3 INESC-ID and Instituto Superior Técnico, Universidade de Lisboa, Lisboa, Portugal. 4These authors
contributed equally: Shayegan Omidshafiei, Karl Tuyls. ✉email: somidshafi[email protected]
G
ames have played a prominent role as platforms for the games. One may also seek to classify games from the standpoint
development of learning algorithms and the measurement of computational complexity. However, a game that is compu-
of progress in artificial intelligence (AI)1–4. Multiplayer tationally challenging to solve may not necessarily be interesting
games, in particular, have played a pivotal role in AI research and to play. Overall, designation of a single measure characterizing
have been extensively investigated in machine learning, ranging games is a non-trivial task.
from abstract benchmarks in game theory over popular board It seems useful, instead, to consider measures that characterize
games such as Chess5,6 and Go7 (referred to as the Drosophila of the possible strategic interactions in the game. A number of
AI research8), to realtime strategy games such as StarCraft II9 and recent works have considered problems involving such interac-
Dota 210. Overall, AI research has primarily placed emphasis on tions41–51. Many of these works analyze agent populations, relying
training of strong agents; we refer to this as the Policy Problem, on game-theoretic models capturing pairwise agent relations.
which entails the search for super human-level AI performance. Related models have considered transitivity (or lack thereof) to
Despite this progress, the need for a task theory, a framework for study games from a dynamical systems perspective52,53; here,
taxonomizing, characterizing, and decomposing AI tasks has a transitive game is one where strategies can be ordered in terms
become increasingly important in recent years11,12. Naturally, of strength, whereas an intransitive game may involve cyclical
techniques for understanding the space of games are likely ben- relationships between strategies (e.g., Rock–Paper–Scissors). Fun-
eficial for the algorithmic development of future AI entities12,13. damentally, the topology exposed via pairwise agent interactions
Understanding and decomposing the characterizing features of seems a key enabler of the powerful techniques introduced in the
games can be leveraged for downstream training of agents via above works. In related literature, graph theory is well-established
curriculum learning14, which seeks to enable agents to learn as a framework for topological analysis of large systems involving
increasingly-complex tasks. interacting entities54–56. Complexity analysis via graph-theoretic
A core challenge associated with designing such a task theory techniques has been applied to social networks57,58, the web-
has been recently coined the Problem Problem, defined as “the graph59,60, biological systems61–63, econometrics64,65, and linguis-
engineering problem of generating large numbers of interesting tics66. Here, we demonstrate that the combination of graph and
adaptive environments to support research”15. Research associated game theory provides useful tools for analyzing the structure of
with the Problem Problem has a rich history spanning over 30 general-sum, many-player games.
years, including the aforementioned work on task theory11,12,16, The primary contribution of this work is a graph-based toolkit
procedurally-generated videogame features17–19, generation of for analysis and comparison of games. As detailed below, the
games and rule-sets for General Game Playing20–26, and proce- nodes in our graphs are either strategies (in abstract games) or AI
dural content generation techniques27–36; we refer readers to agents (in empirical games, where strategies correspond to
Supplementary Note 1 for detailed discussion of these and related learned or appropriately-sampled player policies). The interac-
works. An important question that underlies several of these tions between these agents, as quantified by the game’s payoffs,
interlinked fields is: what makes a game interesting enough for an constitute the structure of the graph under analysis. We show that
AI agent to learn to play? Resolving this requires techniques that this set of nodes and edges, also known as the α-Rank response
can characterize the topological landscape of games, which is the graph49–51, yields useful insights into the structure of individual
topic of interest in this paper. We focus, in particular, on the games and can be used to generate a landscape over collections of
characterization of multiplayer games (i.e., those involving inter- games (as in Fig. 1). We subsequently use the toolkit to analyze
actions of multiple agents), and henceforth use the shorthand of various games that are both played by humans or wherein AI
games to refer to this class. agents have reached human-level performance, including Go,
The objective of this paper is to establish tools that enable MuJoCo Soccer, and StarCraft II. Our overall analysis culminates
discovery of a topology over games, regardless of whether they are in a demonstration of how the topological structure over games
interesting or not; we do not seek to answer the interestingness can be used to tackle the interestingness question of the Problem
question here, although such a toolkit can be useful for subse- Problem, which seeks to automatically generate games with
quently considering it. Naturally, many notions of what makes a interesting characteristics for learning agents15.
game interesting exist, from the perspectives of human-centric
game design, developmental learning, curriculum learning, AI
training, and so on. Our later experiments link to the recent work Results
of Czarnecki et al.37, which investigated properties that make a Overview. We develop a foundational graph-theoretic toolkit that
game interesting specifically from an AI training perspective, as facilitates analysis of canonical and large-scale games, providing
also considered here. We follow the interestingness character- insights into their related topological structure in terms of their
ization of Czarnecki et al.37, which defines so-called Games of high-level strategic interactions. The prerequisite game theory
Skill that are engaging for agents due to: (i) a notion of progress; background and technical details are provided in the “Methods”
(ii) availability of diverse play styles that perform similarly well. section, with full discussion of related works and additional
We later show how clusters of games discovered by our approach details in Supplementary Note 1.
align with this notion of interestingness. An important benefit of Our results are summarized as follows. We use our toolkit to
our approach is that it applies to adversarial and cooperative characterize a number of games, first analyzing motivating
games alike. Moreover, while the procedural game structure examples and canonical games with well-defined structures, then
generation results we later present target zero-sum games due to extending to larger-scale empirical games datasets. For these
the payoff parameterization chosen in those particular experi- larger games, we rely on empirical game-theoretic analysis67,68,
ments, they readily extend to general-sum games. where we characterize an underlying game using a sample set of
How does one topologically analyze games? One can consider policies. While the empirical game-theoretic results are subject to
characterizations of a game as quantified by measures such as the the policies used to generate them, we rely on a sampling scheme
number of strategies available, players involved, whether the game designed to capture a diverse variety of interactions within each
is symmetric, and so on. One could also order the payouts to game, and subsequently conduct sensitivity analysis to validate
players to taxonomize games, as done in prior works exploring the robustness of the results. We demonstrate correlation between
2 × 2 games38–40. For more complex games, however, such the complexity of the graphs associated with games and the
measures are crude, failing to disambiguate differences in similar complexity of solving the game itself. In Supplementary Note 2,
Blotto(10,3) Rock-paper-scissors
Blotto(10,5)
Disc game
Blotto(10,4)
Blotto(5,3) Elo game (noise = 1.0)
Blotto(5,4)
Blotto(5,5) Kuhn poker
Misère Tic-Tac-Toe
Random game of skill
3-move parity game
Hex (board size = 3) Elo game (noise = 0.5)
Tic-Tac-Toe
Alphastar league
Misère hex (board size = 3)
Quoridor (board size = 3) Elo game (noise = 0.1)
Go (board size = 3)
Quoridor (board size = 4) Normal bernoulli game
Connect four
Elo game (noise = 0.0)
Misère connect four
Go (board size = 4) Transitive game
Fig. 1 A landscape of games revealed by the proposed response graph-based workflow. This landscape is generated by collecting features associated
with the response graph of each game, and plotting the top two principal components. At a high level, games whose response graphs are characteristically
similar are situated close to one another in this landscape. Notably, variations of games with related rules are well-clustered together, indicating strong
similarity despite their widely-varying raw sizes. Instances of Blotto cluster together, despite their payoff table sizes ranging from 20 × 20 for Blotto(5,3) to
1000 × 1000 for Blotto(10,5). Games with strong transitive components (e.g., variations of Elo games, AlphaStar League, Random Game of Skill, and
Normal Bernoulli Game) can be observed to be well separated from strongly cyclical games (Rock–Paper–Scissors and the Disc game). Closely-related real-
world games (i.e., games often played by humans in the real world, such as Hex, Tic-Tac-Toe, Connect Four and each of their respective Misere
counterparts) are also clustered together.
we evaluate our proposed method against baseline approaches for More importantly, the complexity of learning useful mixed
taxonomization of 2 × 2 games38. We finally demonstrate how strategies to play in each of these games is closely associated
this toolkit can be used to automatically generate interesting game with this structural backbone. To exemplify this, consider the
structures that can, for example, subsequently be used to train AI computational complexity of solving each of these games; for
agents. brevity, we henceforth refer to solving a game as synonymous
with finding a Nash equilibrium (similar to prior works69–72,
wherein the Nash equilibrium is the solution concept of interest).
Motivating example. Let us start with a motivating example to Specifically, we visualize this computational complexity by
solidify intuitions and explain the workflow of our graph- using the Double Oracle algorithm73, which has been well-
theoretic toolkit, using classes of games with simple parametric established as a Nash solver in multiagent and game theory
structures in the player payoffs. Specifically, consider games of literature47,74–76. At a high level, Double Oracle starts from a sub-
three broad classes (generated as detailed in the Supplementary game consisting of a single randomly-selected strategy, iteratively
Methods 1): games in which strategies have a clear transitive expands the strategy space via best responses (computed by an
ordering (Fig. 2a); games in which strategies have a cyclical oracle), until discovery of the Nash equilibrium of the full
structure wherein all but the final strategy are transitive with underlying game.
respect to one another (Fig. 2d); and games with random (or no Figure 2c, f and i visualize the distribution of Double Oracle
clear underlying) structure (Fig. 2g). We shall see that the core iterations needed to solve the corresponding games, under
characteristics of games with shared underlying structure is random initializations. Note, in particular, that although the
recovered via the proposed analysis. underlying payoff structure of the transitive and cyclical games,
Each of these figures visualizes the payoffs corresponding to 4 respectively, visualized in Fig. 2a, d is similar, the introduction of
instances of games of the respective class, with each game a cycle in the latter class of games has a substantial impact on the
involving 10 strategies per player; more concretely, entry M(si, sj) complexity of solving them (as evident in Fig. 2f). In particular,
of each matrix visualized in Fig. 2a, d and g quantifies the payoff whereas the former class of games are solved using a low (and
received by the first player if the players, respectively, use deterministic) number of iterations, the latter class requires
strategies si and sj (corresponding, respectively, to the i-th and j- additional iterations due to the presence of cycles increasing the
th row and column of each payoff table). Despite the variance in number of strategies in the support of the Nash equilibrium.
payoffs evident in the instances of games exemplified here, each
essentially shares the payoff structure exposed by re-ordering
their strategies, respectively, in Fig. 2b, e and h. In other words, Workflow. Overall, the characterization of the topological
the visual representation of the payoffs in this latter set of figures structure of games is an important and nuanced problem. To
succinctly characterizes the backbone of strategic interactions address this problem, we use graph theory to build an analytical
within these classes of games, despite not being immediately toolkit automatically summarizing the high-level strategic inter-
apparent in the individual instances visualized. actions within a game, and providing useful complexity measures
a b c
S0
S1
S2
S3
S4
S5
S6
S7
S8
S9
S0
S1
S2
S3
S4
S5
S6
S7
S8
S9
S0
S1
S2
S3
S4
S5
S6
S7
S8
S9
S0
S1
S2
S3
S4
S5
S6
S7
S8
S9
1.0 1.00
Transitive games S0 1.0
Frequency
S1 0.75
S2 0.5 0.5
S3 0.50
S4
0.0 0.0
S5 0.25
S6
S7 –0.5 –0.5 0.00
S8 0 5 10
S9 –1.0
–1.0 Iterations to solve
d e f
S0
S1
S2
S3
S4
S5
S6
S7
S8
S9
S0
S1
S2
S3
S4
S5
S6
S7
S8
S9
S0
S1
S2
S3
S4
S5
S6
S7
S8
S9
S0
S1
S2
S3
S4
S5
S6
S7
S8
S9
1.0 1.00
1.0
Cyclical games
S0
Frequency
S1 0.75
S2 0.5 0.5
S3 0.50
S4
0.0 0.0
S5 0.25
S6
S7 –0.5 –0.5 0.00
S8 0 5 10
S9 –1.0
–1.0 Iterations to solve
g h i
S0
S1
S2
S3
S4
S5
S6
S7
S8
S9
S0
S1
S2
S3
S4
S5
S6
S7
S8
S9
S0
S1
S2
S3
S4
S5
S6
S7
S8
S9
S0
S1
S2
S3
S4
S5
S6
S7
S8
S9
1.0 1.00
1.0
Random games
S0
Frequency
S1
0.5 0.75
S2 0.5
S3 0.50
S4
0.0 0.0
S5 0.25
S6
S7 –0.5 –0.5
0.00
S8 0 5 10
S9 –1.0
–1.0 Iterations to solve
Fig. 2 Motivating example of three classes of two-player, symmetric zero-sum games. a, d, and g, respectively, visualize payoffs for instances of games
with transitive, cyclical, and random structure. Each exemplified game consists of two players with 10 strategies each (with payoff row and column labels,
{s0, …, s9}, indicating the strategies). Despite the numerous payoff variations possible in each class of games illustrated, each shares the underlying payoff
structure shown, respectively, in b, e, and h. Moreover, variations in payoffs can notably impact the difficulty of solving (i.e., finding the Nash equilibrium)
of these games, as visualized in c, f, i.
thereof. Specifically, consider again our motivating transitive properties78). Reprojecting the response graph by using the top
game, re-visualized using a collection of graph-based measures in eigenvectors yields the spectral response graph visualized in
Fig. 3. Each of these measures provides a different viewpoint on Fig. 3c, wherein similar strategies are placed close to one another.
the underlying game, collectively characterizing it. Specifically, Moreover, one can cluster the spectral response graph, yielding
given the game payoffs, Fig. 3b visualizes the so-called α-Rank the clustered response graph, which exposes three classes of
response graph of the game; here, each node corresponds to a strategies in Fig. 3d: a fully dominated strategy with only outgoing
strategy (for either player, as this particular game’s payoffs are edges (a singleton cluster, on the bottom left of the graph), a
symmetric). Transition probabilities between nodes are informed transient cluster of strategies with both incoming and outgoing
by a precise evolutionary model (detailed in “Methods” section edges (top cluster), and a dominant strategy with all incoming
and Omidshafiei et al.50); roughly speaking, a directed edge from edges (bottom right cluster). Finally, contracting the clustered
one strategy to another indicates the players having a higher graph by fusing nodes within each cluster yields the high-level
preference for the latter strategy, in comparison to the former. characterization of transitive games shown in Fig. 3e.
The response graph, thus, visualizes all preferential interactions We can also conduct this analysis for instances of our other
between strategies in the game. Moreover, the color intensity of motivating games, such as the cyclical game visualized in Fig. 4a.
each node indicates its so-called α-Rank, which measures the Note here the distinct differences with the earlier transitive game
long-term preference of the players for that particular strategy, as example; in the cyclical game, the α-Rank distribution in the
dictated by the transition model mentioned above; specifically, response graph (Fig. 4b) has higher entropy (indicating preference
darker colors here indicate more preferable strategies. for many strategies, rather than one, due to the presence of cycles).
This representation of a game as a graph enables a variety of Moreover, the spectral reprojection in Fig. 4d reveals a clear set of
useful insights into its underlying structure and complexity. For transitive nodes (left side of visualization) and a singleton cluster
instance, consider the distribution of cycles in the graph, which of a cycle-inducing node (right side). Contracting this response
play an important role in multiagent evaluation and training graph reveals the fundamentally cyclical nature of this game
schemes41,50,53,77 and, as later shown, are correlated to the (Fig. 4f). Finally, we label each strategy (i.e., each row and column)
computational complexity of solving two-player zero-sum games of the original payoff table Fig. 4a based on this clustering analysis.
(e.g., via Double Oracle). Figure 3f makes evident the lack of Specifically, the color-coded labels on the far left (respectively, top)
cycles in the particular class of transitive games; while this of each row (respectively, column) in Fig. 4a correspond to the
is clearly apparent in the underlying (fully ordered) payoff clustered strategy colors in Fig. 4d. This color-coding helps clearly
visualization of Fig. 3a, it is less so in the unordered variants identify the final strategy (i.e., bottom row of the payoff table)
visualized in Fig. 2a. Even so, the high-level relational structure as the outlier enforcing the cyclical relationships in the game. Note
between the strategies becomes more evident by conducting a that while there is no single graphical structure that summarizes
spectral analysis of the underlying game response graph. Full the particular class of random games visualized earlier in Fig. 2h,
technical details of this procedure are provided in the “Methods” we include this analysis for several instances of such games in
section. At a high level, the so-called Laplacian spectrum (i.e., Supplementary Note 2.
eigenvalues) of a graph, along with associated eigenvectors, Crucially, a key benefit of this analysis is that the game structure
captures important information regarding it (e.g., number of exposed is identical for all instances of the transitive and cyclical
spanning trees, algebraic connectivity, and numerous related games visualized earlier in Fig. 2a, d, making it significantly easier
f
g
Graph
statistics Principal
components
1.0
Frequency
Blotto(10,3) Rock-paper-scissors
Blotto(10,5)
Disc game
Blotto(10,4)
0.5
Blotto(5,3) Elo game (noise = 1.0)
Blotto(5,4)
Blotto(5,5) Kuhn poker
Misère Tic-Tac-Toe
Random game of skill
3-move parity game
Hex (board size = 3) Elo game (noise = 0.5)
Tic-Tac-Toe
Alphastar league
Misère hex (board size = 3)
h
Quoridor (board size = 3) Elo game (noise = 0.1)
0.0
a b
Go (board size = 3)
Quoridor (board size = 4) Normal bernoulli game
0 1 2
Connect four
Elo game (noise = 0.0)
Misère connect four
Go (board size = 4) Transitive game
Cycle length
α-Rank Parametric
Payoff response graph game structure
tensor
generation
1.00
Blotto(10,3)
0.75 Blotto(10,5)
Rock-paper-scissors
Disc game
Blotto(10,4)
Blotto(5,3) Elo game (noise = 1.0)
0.50 Blotto(5,4)
Blotto(5,5) Kuhn poker
Misère Tic-Tac-Toe Random game of skill
0.25 3-move parity game
Hex (board size = 3) Elo game (noise = 0.5)
Tic-Tac-Toe
Alphastar league
0.00 Misère hex (board size = 3)
Quoridor (board size = 3) Elo game (noise = 0.1)
Go (board size = 3)
Normal bernoulli game
–0.25 Quoridor (board size = 4)
Connect four Elo game (noise = 0.0)
Misère connect four
Go (board size = 4)
c d e
Transitive game
–0.50
–0.75
–1.00
Fig. 3 Method workflow, with accompanying transitive game results. Given the game payoffs a, the so-called α-Rank response graph of the game is
visualized in b. In c, reprojecting the response graph by using the top eigenvectors of the graph Laplacian yields the spectral response graph, wherein
similar strategies are placed close to one another. In d, taking this one step further, one can cluster the spectral response graph, yielding the clustered
response graph, which exposes three classes of strategies in this particular example. In e, contracting the clustered graph by fusing nodes within each
cluster yields the high-level characterization of transitive games. In f, the lack of cycles in the particular class of transitive games becomes evident. Finally,
in g and h, one can extract the principal components of various response graph statistics and establish a feedback loop to a procedural game structure
generation scheme to yield new games.
a b c d e f
1.00 1.0
Frequency
0.75
0.50
0.25 0.5
0.00
–0.25
–0.50 0.0
–0.75 0 2 4 6 8 10
–1.00
Cycle length
Fig. 4 Cyclical game results. a Game payoffs, b response graph, c cycles histogram, d spectral response graph, e clustered response graph, f contracted
response graph.
to characterize games with related structure, in contrast to analysis This graph-based analysis also extends to general-sum games.
of raw payoffs. Our later case studies further exemplify this, As an example, consider the slightly more complex game of
exposing related underlying structures for several classes of more 11–20, wherein two players each request an integer amount of
complex games. money between 11 and 20 units (inclusive). Each player receives
the amount requested, though a bonus of 20 units is allotted to
Analysis of canonical and real-world games. The insights one player if they request exactly 1 unit less than the other player.
afforded by our graph-theoretic approach apply to both small The payoffs and response graph of this game are visualized,
canonical games and larger empirical games (where strategies are respectively, in Fig. 5g, h, where strategies, from top-to-bottom
synonymous with trained AI agents). and left-to-right in the payoff table, correspond to increasing
Consider the canonical Rock–Paper–Scissors game, involving a units of money being requested. This game, first introduced by
cycle among the three strategies (wherein Rock loses to Paper, Arad and Rubinstein79, is structurally designed to analyze so-
which loses to Scissors, which loses to Rock). Figure 5a visualizes a called k-level reasoning, wherein a level-0 player is naive (i.e., here
variant of this game involving a copy of the first strategy, Rock, simply requests 20 units), and any level-k player responds to an
which introduces a redundant cycle and thus affects the distribution assumed level-(k − 1) opponent; e.g., here a level-1 player best
of cycles in the game. Despite this, the spectral response graph responds to an assumed level-0 opponent, thus requesting 19
(Fig. 5d) reveals that the redundant game topologically remains the units to ensure receiving the bonus units.
same as the original Rock–Paper–Scissors game, thus reducing to The spectral response graph here (Fig. 5j) indicates a more
the original game under spectral clustering. complex mix of transitive and intransitive relations between
Frequency
1.00
0.75
0.50
0.25
0.5
0.00
–0.25
–0.50 0.0
–0.75 0 1 2 3 4
–1.00
Cycle length
g h i j k l
1.0
Frequency
35
11–20
30 0.5
25
20 0.0
15
0 2 4 6 8 10
Cycle length
Fig. 5 Results for redundant Rock–Paper–Scissors (RPS) and 11–20 game. In Redundant RPS, the redundant copy of the first strategy (Rock) is clustered
in the spectral response graph. In 11–20, seven clusters of strategies are revealed, exposing the cyclical nature of this game.
0.8
0.5
0.6
0.4
0.0
0.2 0 1 2
0.0 Cycle length
g h i j k l
MuJoCo soccer
0.8
1.0
Frequency
0.7
0.6 0.5
0.5
0.4
0.0
0.3 0123456
0.2 Cycle length
Fig. 6 Results for empirical games of AlphaGo and MuJoCo soccer dataset. Note that as these are empirical games, strategies here correspond to trained
AI agents. In AlphaGo, the strong transitive relationship between agents is revealed via our analysis. In MuJoCo soccer, more complex relations between
similarly-performing agents are revealed in the clusters produced.
strategies. Notably, the contracted response graph (Fig. 5l) reveals Table 9). The α-Rank distribution indicated by the node (i.e.,
7 clusters of strategies. Referring back to the rows of payoffs in strategy) color intensities in Fig. 6b reveals AG(rvp) as a dominant
Fig. 5g, relabeled to match cluster colors, demonstrates that our strategy, and the cycle distribution graph Fig. 6c reveals a lack of
technique effectively pinpoints the sets of strategies that define cycles here. The spectral response graph, however, goes further,
the rules of the game: weak strategies (11 or 12 units, first two revealing a fully transitive structure (Fig. 6d, e), as in the
rows of the payoff table, and evident in the far-right of the motivating transitive games discussed earlier. The spectral analysis
clustered response graph), followed by a set of intermediate on this particular empirical game, therefore, reveals its simple
strategies with higher payoffs (clustered pairwise, near the lower- underlying transitive structure (Fig. 6f).
center of the clustered response graph), and finally the two key Consider a more interesting empirical game, wherein agents
strategies that establish the cyclical relationship within the game are trained to play soccer in the continuous control domain of
through k-level reasoning (19 and 20 units, corresponding to MuJoCo, exemplified in Fig. 6 (second row). Each agent in this
level-0 and level-1 players, in the far-left of the clustered response empirical game is generated using a distinct set of training
graph). parameters (e.g., feedforward vs. recurrent policies, reward
This analysis extends to more complex instances of empirical shaping enabled and disabled, etc.), with full agent specifications
games, which involve trained AI agents, as next exemplified. and payoffs detailed by Liu et al.80. The spectral response graph
Consider first the game of Go, as played by 7 AlphaGo variants: (Fig. 6k) reveals two outlier agents: a strictly dominated agent
AG(r), AG(p), AG(v), AG(rv), AG(rp), AG(vp), and AG(rvp), (node in the top-right), and a strong (yet not strictly dominant)
where each variant uses the specified combination of rollouts r, agent (node in the top-left). Several agents here are clustered
value networks v, and/or policy networks p. We analyze the pairwise, revealing their closely-related interactions with respect
empirical game where each strategy corresponds to one of these to the other agents; such information could, for example, be used
agents, and payoffs (Fig. 6a) correspond to the win rates of these to discard or fuse such redundant agents during training to save
agents when paired against each other (as detailed by Silver et al.7 computational costs.
0.8
0.6
0.4
0.2
0.0
e f g h
AlphaStar (best agents)
1.0
0.8
0.6
0.4
0.2
0.0
Fig. 7 AlphaStar results, with both the full league and the league with best agents visualized. Spectral analysis of the empirical AlphaStar League game
reveals that several key subsets of closely-performing agents, illustrated in d. Closer inspection of the agents used to construct this empirical payoff table
reveals the following insights, with agent types corresponding to those detailed in Vinyals et al.9: (i) the blue, orange, and green clusters are composed of
agents in the initial phases of training, which are generally weakest (as observed in d, and also visible as the narrow band of low payoffs in the top region of
the payoff table a); (ii) the red cluster consists primarily of various, specialized exploiter agents; (iii) the purple and brown clusters are primarily composed
of the league exploiters and main agents, with the latter being generally higher strength than the former.
Consider next a significantly larger-scale empirical game, explicitly compared against one another. For instance, consider
consisting of 888 StarCraft II agents from the AlphaStar Final Blotto(τ, ρ), a zero-sum two-player game wherein each player has
league of Vinyals et al.9. StarCraft II is a notable example, τ tokens that they can distribute amongst ρ regions81. In each
involving a choice of 3 races per player and realtime gameplay, region, each player with the most tokens wins (see Tuyls et al.53
making a wide array of behaviors possible in the game itself. The for additional details). In the variant we analyze here, each player
empirical game considered is visualized in Fig. 7a, and is receives a payoff of +1, 0, and −1 per region, respectively, won,
representative of a large number of agents with varying skill drawn, and lost. The permutations of each player’s allocated
levels. Despite its size, spectral analysis of this empirical game tokens, in turn, induce strong cyclical relations between the
reveals that several key subsets of closely-performing agents exist possible policies in the game.While the strategy space for this
here, illustrated in Fig. 7d. Closer inspection of the agents used to τþρ1
game is of size , payoffs matrices can be fully
construct this empirical payoff table reveals the following insights, ρ1
with agent types corresponding to those detailed in Vinyals et al.9: specified for small instances, as shown for Blotto(5,3) and Blotto
(i) the blue, orange, and green clusters are composed of agents in (10,3) in Fig. 8 (first and second row, respectively). Despite the
the initial phases of training, which are generally weakest (as differences in strategy space sizes in these particular instances of
observed in Fig. 7d, and also visible as the narrow band of low Blotto, the contracted response graphs in Fig. 8d, h capture the
payoffs in the top of Fig. 7a); (ii) the red cluster consists primarily cyclical relations underlying both instances, revealing a remark-
of various, specialized exploiter agents; iii) the purple and brown ably similar structure.
clusters are primarily composed of the league exploiters and main For larger games, the cardinality of the pure policy space
agents, with the latter being generally higher strength than the typically makes it infeasible to fully enumerate policies and
former. To further ascertain the relationships between only the construct a complete empirical payoff table, despite the pure
strongest agents, we remove the three clusters corresponding to policy space being finite in size. For example, even in games such
the weakest agents, repeating the analysis in Fig. 7 (bottom row). as Tic–Tac–Toe, while the number of unique board configura-
Here, we observe the presence of a series of progressively stronger tions is reasonably small (9! = 362880), the number of pure,
agents (bottom nodes in Fig. 7h), as well as a single outlier agent behaviorally unique policies is enormous (≈10567, see Czarnecki
which quite clearly bests several of these clusters (top node of et al.37 Section J for details). Thus, coming up with a principled
Fig. 7h). definition of a scheme for sampling a relevant set of policies
An important caveat, as this stage, is that the agents in summarizing the strategic interactions possible within large
AlphaGo, MuJoCo soccer, and AlphaStar above were trained to games remains an important open problem. In these instances,
maximize performance, rather than to explicitly reveal insights we rely on sampling policies in a manner that captures a set of
into their respective underlying games of Go, soccer, and representative policies, i.e., a set of policies of varying skill levels,
StarCraft II. Thus, this analysis focused on characterizing which approximately capture variations of strategic interactions
relationships between the agents from the Policy Problem possible in the underlying game. The policy sampling approach
perspective, rather than the underlying games themselves, which we use is motivated with the above discussions and open question
provide insights into the interestingness of the game (Problem in mind, in that it samples a set of policies with varying skill
Problem). This latter investigation would require a significantly levels, leading to a diverse set of potential transitive and
larger population of agents, which cover the policy space of the intransitive interactions between them.
underlying game effectively, as exemplified next. Specifically, we use the policy sampling procedure proposed by
Naturally, characterization of the underlying game can be Czarnecki et al.37, which also seeks a set of representative policies
achieved in games small enough where all possible policies can be for a given game. The details of this procedure are provided in
0.8
Blotto (5,3)
0.6
0.4
0.2
0.0
e f g h
1.0
Blotto (10,3)
0.8
0.6
0.4
0.2
0.0
i j k l
Go (board size=3)
1.0
0.8
0.6
0.4
0.2
0.0
Fig. 8 Results for Blotto(5,3), Blotto(10,3), and Go (board size=3). Despite the significant difference in sizes, both instances of Blotto yield a remarkably
similar contracted response graph. Moreover, the contracted response graph for Go is notably different from AlphaGo results, due to the latter being an
empirical game constructed from trained AI agents rather than a representative set of sampled policies.
Supplementary Methods 1, and at a high level involve three complexity of solving their associated games. We investigate these
phases: (i) using a combination of tree search algorithms, potential correlations here, while noting that these results are not
Alpha–Beta82 and Monte Carlo Tree Search83, with varying tree intended to propose that a specific definition of computational
depth limits for the former and varying number of simulations complexity (e.g., with respect to Nash) is explicitly useful for
allotted to the latter, thus yielding policies of varying transitive defining a topology/classification over games. In Fig. 9, we
strengths; (ii) using a range of random seeds in each instantiation compare several response graph complexity measures against the
of the above algorithms, thus producing a range of policies for number of iterations needed to solve a large collection of games
each level of transitive strength; (iii) repeating the same procedure using the Double Oracle algorithm73. The results here consider
with negated game payoffs, thus also covering the space of specifically the α-Rank entropy, number of 3-cycles, and mean in-
policies that actively seek to lose the original game. While this degree (with details in “Methods” section, and results for addi-
sampling procedure is a heuristic, it produces a representative set tional measures included in Supplementary Note 2). As in earlier
of policies with varying degrees of transitive and intransitive experiments, solution of small-scale games is computed using
relations, and thus provides an approximation of the underlying payoffs over full enumeration of pure policies, whereas that of
game that can be feasibly analyzed. larger games is done using the empirical games over sampled
Let us revisit the example of Go, constructing our empirical policies. Each graph complexity measure reported is normalized
game using the above policy sampling scheme, rather than the with respect to the maximum measure possible in a graph of the
AlphaGo agents used earlier. We analyze a variant of the game same size, and the number of iterations to solve is normalized
with board size 3 × 3, as shown in Fig. 8 (third row). Notably, the with respect to the number of strategies in the respective game.
contracted response graph (Fig. 8l) reveals the presence of a Thus, for each game, the normalized number of iterations to solve
strongly-cyclical structure in the underlying game, in contrast to provides a measure of its relative computational complexity
the AlphaGo empirical game (Fig. 6). Moreover, the presence of a compared to games with the same strategy space size (for com-
reasonably strong agent (visible in the top of the contracted pleteness, we include experiments testing the effects of normal-
response graph) becomes evident here, though this agent also ization on these results in Supplementary Note 2).
shares cyclical relations with several sets of other agents. Overall, Several trends can be observed in these results. First, the
this analysis exemplifies the distinction between analyzing an entropy of the α-Rank distribution associated with each game
underlying game (e.g., Go) vs. analyzing the agent training correlates well with its computational complexity (see Spearman’s
process (e.g., AlphaGo). Investigation of links between these two correlation coefficient ρs in the top-right of Fig. 9a). This matches
lines of analysis, we believe, makes for an interesting avenue for intuition, as higher entropy α-Rank distributions indicate a larger
future work. support over the strategy space (i.e., strong strategies, with non-
zero α-Rank mass), thus requiring additional iterations to solve.
Moreover, the number of 3-cycles in the response graph also
Linking response graph and computational complexity. A correlates well with computational complexity, again matching
question that naturally arises is whether certain measures over intuition as the intransitivities introduced by cycles typically
response graphs are correlated with the computational make it more difficult to traverse the strategy space42. Finally, the
Game
0: Blotto(10,3) 6: Blotto(5,5) 12: Elo game (noise=10) 18: Connect Four 23: Misère Hex (board size=3)
1: Blotto(10,4) 7: AlphaStar league 13: Kuhn Poker 19: Go (board size=3) 24: Misère Tic-Tac-Toe
2: Blotto(10,5) 8: Disc game 14: Normal Bernoulli game 20: Go (board size=4) 24: Quoridor (board size=3)
3: 3-move parity game 9: Elo game (noise=0.0) 15: Rock-Paper-Scissors 21: Hex (board size=3) 26: Quoridor (board size=4)
4: Blotto(5,3) 10: Elo game (noise=0.1) 16: Random Game of Skill 22: Misère Connect Four 27: Tic-Tac-Toe
5: Blotto(5,4) 11: Elo game (noise=0.5) 17: Transitive game
a b s = 0.63
c
s = 0.61, p = 0.00 p = 0.00 s = 0.45 p = 0.02
1.0 2 16 50 0.5 20
α-Rank distribution entropy (normalized)
8 15
3 23 21 4 15 3 19 5 4
20 0.75 21
12 16
# of 3-cycles (normalized)
0.4 5 4
25 3 19 1 0.70 22
2419 11 6 26
21 18
0.6 7 22 0.3 23 25 27 2 0
20 18 22
16 2 0.65
26 18 0 24
10
0.4 0.2 25 27
14 24 0.60
8
0.2 0.1 0.55
12 13
16 7 9 17 81410167 11 12 13 15
917 9 17 11 0.50
0.0 0.0 14 10
–3 –2 –1 0
–3 –2 –1 0 –3 –2 –1 0 10 10 10 10
10 10 10 10 10 10 10 10
Iterations to solve (normalized)
Iterations to solve (normalized) Iterations to solve (normalized)
Fig. 9 Response graph complexity vs. computational complexity of solving associated games. Each figure plots a respective measure of graph
complexity against the normalized number of iterations needed to solve the associated game via the Double Oracle algorithm (with normalization done
with respect to the total number of strategies in each underlying game). The Spearman correlation coefficient, ρs, is shown in each figure (with the reported
two-sided p-value rounded to two decimals).
mean in-degree over all response graph nodes correlates less so with strong transitive components (e.g., variations of Elo games,
with computational complexity (though degree-based measures AlphaStar League, Random Game of Skill, and Normal Bernoulli
still serve a useful role in characterizing and distinguishing graphs Game) are also notably separated from strongly cyclical games
of differing sizes84). Overall, these results indicate that response (Rock–Paper–Scissors, Disc game, and Blotto variations). Closely-
graph complexity provides a useful means of quantifying the related real-world games (i.e., games often played by humans in
computational complexity of games. the real world, such as Hex, Tic-Tac-Toe, Connect Four, and each
of their respective Misère counterparts wherein players seek to
lose) are also well-clustered. Crucially, the strong alignment of
The landscape of games. The results, thus far, have demonstrated this analysis with intuitions of the similarity of certain classes of
that graph-theoretic analysis can simplify games (via spectral games serves as an important validation of the graph-based
clustering), uncover their topological structure (e.g., transitive analysis technique proposed in this work. In addition, the analysis
structure of the AlphaGo empirical game), and yield measures and corresponding landscape of games make clear that several
correlated to the computational complexity of solving these games of interest for AI seem well-clustered together, which also
games. Overall, it is evident that the perspective offered by graph holds for less interesting games (e.g., Transitive and Elo games).
theory yields a useful characterization of games across multiple We note that the overall idea of generating such a landscape
fronts. Given this insight, we next consider whether this char- over games ties closely with prior works on taxonomization of
acterization can be used to compare a widely-diverse set of games. multiplayer games38,85. Moreover, 2D visualization of the
To achieve this, we construct empirical payoff tables for a suite expressitivity (i.e., style and diversity) and the overall space of
of games, using the policy sampling scheme described earlier for of procedurally-generated games features have been also
the larger instances (also see Supplementary Methods 1 for full investigated in closely related work86,87. A recent line of related
details, including description of the games considered and inquiry also investigates the automatic identification, and
analysis of the sensitivity of these results to the choice of subsequent visualization of core mechanics in single-player
empirical policies and mixtures thereof). For each game, we games88. Overall, we believe this type of investigation can be
compute the response graphs and several associated local and considered a method to taxonomize which future multiplayer
global complexity measures (e.g., α-Rank distribution entropy, games may be interesting, and which ones less so to train AI
number of 3-cycles, node-wise in-degrees and out-degree agents on.
statistics, and several other measures detailed in the “Methods” While the primary focus of this paper is to establish a means of
section), which constitute a feature vector capturing properties of topologically studying games (and their similarities), a natural
interest. Finally, a principal component analysis of these features artifact of such a methodology is that it can enable investigation
yields the low-dimensional visualization of the landscape of of interesting and non-interesting classes of games. There exist
games considered, shown in Fig. 1. numerous perspectives on what may make a game interesting,
We make several key insights given this empirical landscape of which varies as a function of the field of study or problem being
games. Notably, variations of games with related rules are well- solved. These include (overlapping) paradigms that consider
clustered together, indicating strong similarity despite the widely- interestingness from: a human-centric perspective (e.g., level of
varying sizes of their policy spaces and empirical games used to social interactivity, cognitive learning and problem solving,
construct them; specifically, all considered instances of Blotto enjoyment, adrenaline, inherent challenge, esthetics, story-
cluster together, with empirical game sizes ranging from 20 × 20 telling in the game, etc.)89–94; a curriculum learning perspective
for Blotto(5,3) to 1000 × 1000 for Blotto(10,5). Moreover, games (e.g., games or tasks that provide enough learning signal to the
human or artificial learner)95; a procedural content generation or underlying games with complex rules. At a high level, given a
optimization perspective (in some instances with a focus on parameterization of a generated game that specifies an associated
General Game Playing)21 that use a variety of fitness measures to payoff tensor, we synthesize its response graph and associated
generate new game instances (e.g., either direct measures related measures of interest (as done when generating the earlier
to the game structure, or indirect measures such as player win- landscape of games). We use the multidimensional Elo para-
rates)20,31,34,35,96; and game-theoretic, multiplayer, or player-vs.- meterization for generating payoffs, due to its inherent ability to
task notions (e.g., game balance, level of competition, social specify complex transitive and intransitive games41. We then
equality or welfare of the optimal game solution, etc.)97–99. specify an objective function of interest to optimize over these
Overall, it is complex (and, arguably, not very useful) to introduce graph-based measures. As only the evaluations of such graph-
a unifying definition of interestingness that covers all of the above based measures (rather than their gradients) are typically available,
perspectives. we use a gradient-free approach to iteratively generate games
Thus, we focus here on a specialized notion of interestingess optimizing these measures (CMA-ES100 is used in our experi-
from the perspective of AI training, linked to the work of ments). The overall generation procedure used can be classified as
Czarnecki et al.37. As mentioned earlier, Czarnecki et al.37 a Search-Based Procedural Content Generation technique
introduce the notion of Games of Skill, which are of interest, in (SBPCG)34. Specifically, in accordance to the taxonomy defined
the sense of AI training, due to two axes of interactions between by Togelius et al.34, our work uses a direct encoding representation
agents: a transitive axis enabling progression in terms of relative of the game (as the generated payoffs are represented as real-
strength or skill, and a radial axis representing diverse intransitive/ valued vectors of mElo parameters), with a theory-driven direct
cyclical interactions between strategies of similar strength levels. evaluation used for quantifying the game fitness/quality (as we rely
The overall outlook provided by the above paper is that in these on graph theory to derive the game features of interest, then
games of interest there exist many average-strength strategies with directly minimize distances of principal components of generated
intransitive relations among them, whereas the level of intransi- and target games).
tivity decreases as transitive strength moves towards an extremum Naturally, we can maximize any individual game complexity
(either very low, or very high strength); the topology of strategies measures, or a combination thereof, directly (e.g., entropy of the
in a Game of Skill is, thus, noted to resemble a spinning top. α-Rank distribution, number of 3-cycles, etc.). More interestingly,
Through the spinning top paradigm, Czarnecki et al.37 identifies however, we can leverage our low-dimensional landscape of
several real-world games (e.g., Hex, Tic–Tac–Toe, Connect–Four, games to directly drive the generation of new games towards
etc.) as Games of Skill. Notably, the lower left cluster of games in existing ones with properties of interest. Consider the instance of
Fig. 1 highlights precisely these games. Interestingly, while the game generation shown in Fig. 10a, which shows an overview of
Random Game of Skill, AlphaStar League, and Elo game the above pipeline generating a 5 × 5 game minimizing Euclidean
(noise = 0.5) are also noted as Games of Skill in Czarnecki distance (within the low-dimensional complexity landscape) to
et al.37, they are found in a distinct cluster in our landscape. On the standard 3 × 3 Rock–Paper–Scissors game. Each point on this
closer inspection of these payoffs for this trio of games, there exists plot corresponds to a generated game instance. The payoffs
a strong correlation between the number of intransitive relations visualized, from left to right, respectively correspond to the initial
and the transitive strength of strategies, in contrast to other Games procedural game parameters (which specify a game with constant
of Skill such as Hex. Our landscape also seems to highlight non- payoffs), intermediate parameters, and final optimized para-
interesting games. Specifically, variations of Blotto, Rock– meters; projections of the corresponding games within the games
Paper–Scissors, and the Disc Game are noted to not be Games landscape are also indicated, with the targeted game of interest
of Skill in Czarnecki et al.37, and are also found to be in distinct (Rock–Paper–Scissors here) highlighted in green. Notably, the
clusters in Fig. 1. final optimized game exactly captures the underlying rules that
Overall, these results highlight how topological analysis of specify a general-size Rock–Paper–Scissors game, in that each
multiplayer games can be used to not only study individual strategy beats as many other strategies as it loses to. In Fig. 10b,
games, but also identify clusters of related (potentially interesting) we consider a larger 13 × 13 generated game, which seeks to
games. For completeness, we also conduct additional studies in minimize distance to a 1000 × 1000 Elo game (which is transitive
Supplementary Note 2, which compare our taxonomization of in structure, as in our earlier motivating example in Fig. 2a). Once
2 × 2 normal-form games (and clustering into potential classes of again, the generated game captures the transitive structure
interest) to that of Bruns38. associated with Elo games.
Next, we consider generation of games that exhibit properties
of mixtures of several target games. For example, consider what
The problem problem and procedural game structure genera- happens if 3 × 3 Rock–Paper–Scissors were to be combined with
tion. Having now established various graph-theoretic tools for the 1000 × 1000 Elo game above; one might expect a mixture of
characterizing games of interest, we revisit the so-called Problem transitive and cyclical properties in the payoffs, though the means
Problem, which targets automatic generation of interesting of generating such mixed payoffs directly is not obvious due to
environments. Here we focus on the question of how we can the inherent differences in sizes of the targeted games. Using our
leverage the topology discovered by our method to procedurally workflow, which conducts this optimization in the low-
generate collections of new games (which can be subsequently dimensional graph-based landscape, we demonstrate a sequence
analyzed, used for training, characterized as interesting or not per of generated games targeting exactly this mixture in Fig. 10c.
previous discussions, and so on). Here, the game generation objective is to minimize Euclidean
Full details of our game generation procedure are provided in distance to the mixed principal components of the two target
the “Methods” Section. At a high level, we establish the feedback games (weighted equally). The payoffs of the final generated game
loop visualized in Fig. 3, enabling automatic generation of games exhibit exactly the properties intuited above, with predominantly
as driven by our graph-based analytical workflow. We generate the positive (blue) upper-triangle of payoff entries establishing a
payoff structure of a game (i.e., as opposed to the raw underlying transitive structure, and the more sporadic positive entries in the
game rules, e.g., as done by Browne and Maire20). Thus, the lower-triangle establishing cycles.
generated payoffs can be considered either direct representations Naturally, this approach opens the door to an important
of normal form games, or empirical games indirectly representing avenue for further investigation, targeting generation of yet more
a b
Blotto(10,3) Rock-paper-scissors Blotto(10,3) Rock-paper-scissors
Blotto(10,5) Blotto(10,5)
Disc game Blotto(10,4) Disc game
Blotto(10,4)
Blotto(5,3) Elo game (noise = 1.0) Blotto(5,3) Elo game (noise = 1.0)
Blotto(5,4) Blotto(5,4)
Blotto(5,5) Kuhn poker Blotto(5,5) Kuhn poker
Misère Tic-Tac-Toe Misère Tic-Tac-Toe
Random game of skill Random game of skill
3-move parity game 3-move parity game
Hex (board size = 3) Elo game (noise = 0.5) Hex (board size = 3) Elo game (noise = 0.5)
Tic-Tac-Toe Tic-Tac-Toe
Alphastar league Alphastar league
Misère hex (board size = 3) Misère hex (board size = 3)
Quoridor (board size = 3) Elo game (noise = 0.1) Quoridor (board size = 3) Elo game (noise = 0.1)
Go (board size = 3) Go (board size = 3)
Quoridor (board size = 4) Normal bernoulli game Quoridor (board size = 4) Normal bernoulli game
Connect four Connect four Elo game (noise = 0.0)
Elo game (noise = 0.0) Misère connect four
Misère connect four
Go (board size = 4) Transitive game Go (board size = 4) Transitive game
c d
Blotto(10,3) Rock-paper-scissors Blotto(10,3) Rock-paper-scissors
Blotto(10,5) Blotto(10,5)
Blotto(10,4) Disc game Blotto(10,4) Disc game
Blotto(5,3) Elo game (noise = 1.0) Blotto(5,3) Elo game (noise = 1.0)
Blotto(5,4) Blotto(5,4)
Blotto(5,5) Kuhn poker Blotto(5,5) Kuhn poker
Misère Tic-Tac-Toe Misère Tic-Tac-Toe
Random game of skill Random game of skill
3-move parity game 3-move parity game
Hex (board size = 3) Elo game (noise = 0.5) Hex (board size = 3) Elo game (noise = 0.5)
Tic-Tac-Toe Tic-Tac-Toe
Alphastar league Alphastar league
Misère hex (board size = 3) Misère hex (board size = 3)
Quoridor (board size = 3) Elo game (noise = 0.1) Quoridor (board size = 3) Elo game (noise = 0.1)
Go (board size = 3) Go (board size = 3)
Quoridor (board size = 4) Normal bernoulli game Quoridor (board size = 4) Normal bernoulli game
Connect four Elo game (noise = 0.0) Connect four Elo game (noise = 0.0)
Misère connect four Misère connect four
Go (board size = 4) Transitive game Go (board size = 4) Transitive game
Fig. 10 Visualization of procedural game structure generation projected in the games landscape. Each figure visualizes the generation of a game of
specified size, which targets a pre-defined game (or mixture of games) of a different size. The three payoffs in each respective figure, from left to right,
correspond to the initial procedural game parameters, intermediate parameters, and final optimized parameters. a 5 × 5 generated game with the target
game set to Rock–Paper–Scissors (3 × 3). b 13 × 13 generated game with the target game set to Elo (1000 × 1000). c 13 × 13 generated game with the target
game set to the mixture of Rock–Paper–Scissors (3 × 3) and Elo game (1000 × 1000). d 19 × 19 generated game with the target game set to the mixture of
AlphaStar League and Go (board size = 3). Strategies are sorted by mean payoffs in b and c to more easily identify transitive structures expected from an
Elo game.
presented a comprehensive study of games under the lens of measures and representations previously explored in that litera-
graph theory and empirical game theory, operating on the ture34 may be considered in lieu of the approach used here.
response graph of any game of interest. The proposed approach Moreover, as the principal contribution of this paper was to
applies to general-sum, many-player games, enabling richer establish a graph-theoretic approach for investigating the land-
understanding of the inherent relationships between strategies (or scape of games, we focused our investigation on empirical ana-
agents), contraction to a representative (and smaller) underlying lysis of a large suite of games. As such, for larger games, our
game, and identification of a game’s inherent topology. We analysis relied on sampling of a representative set of policies to
highlighted insights offered by this approach when applied to a characterize them. An important limitation here is that the
large suite of games, including canonical games, empirical games empirical game-theoretic results are subject to the policies used to
consisting of trained agent policies, and real-world games con- generate them. While our sensitivity analysis (presented in Sup-
sisting of representative sampled policies, extending well beyond plementary Note 2) seems to indicate that the combination of our
typical characterizations of games using raw payoff visualizations, policy sampling scheme and analysis pipeline produce fairly
cardinal measures such as strategy or game tree sizes, or strategy robust results, this is an important factor to revisit in significantly
rankings. We demonstrated that complexity measures associated larger games. Specifically, such a policy sampling scheme can be
with the response graphs analyzed correlate well to the compu- inherently expensive for extremely large games, making it
tational complexity of solving these games, and importantly important to further investigate alternative sampling schemes and
enable the visualization of the landscape of games in relation to associated sensitivities. Consideration of an expanded set of such
one another (as in Fig. 1). The games landscape exposed here was policies (e.g., those that balance the odds for players by ensuring a
then leveraged to procedurally generate games, providing a near-equal win probability) and correlations between the
principled means of better understanding and facilitating the empirical game complexity and the complexity of the underlying
solution of the so-called Problem Problem. policy representations (e.g., deep vs. shallow neural networks or,
While the classes of games generated in this paper were whenever possible, Boolean measures of strategic complex-
restricted to the normal-form (e.g., generalized variants of ity109,110) also seem interesting to investigate. Moreover, formal
Rock–Paper–Scissors), they served as an important validation of study of the propagation of empirical payoff variance to the
the proposed approach. Specifically, this work provides a foun- topological analysis results is another avenue of future interest,
dational layer for generating games that are of interest in a richer potentially using techniques similar to Rowland et al.51. Another
context of domains. In contrast to some of the prior works in direction of research for future work is to analyze agent-vs.-task
taxonomization of games (e.g., that of Bruns40, Liebrand39, games (e.g., those considered in Balduzzi et al.41) from a graph-
Rapoport and Guyer38), a key strength of our approach is that it theoretic lens. Finally, our approach is general enough to be
does not rely on human expertize, or manual isolation of patterns applicable to other areas in social and life sciences111–115, char-
in payoffs or equilibria to compute a taxonomy over games. We acterized by complex ecologies often involving a large number of
demonstrate an example of this in Supplementary Note 2, by strategies or traits. In particular, these processes may be modeled
highlighting similarities and differences of our taxonomization through the use of large-scale response graphs or invasion
with those of Bruns38 over a set of 144 2 × 2 games. Importantly, diagrams116–120, whose overall complexity (and how it may vary
these comparisons highlight that our response graph-based with the inclusion or removal of species, strategies, and conflicts)
approach does not preclude more classical equilibrium-centric is often hard to infer.
analysis from being conducted following clustering, while also Overall, we believe that this work paves the way for related
avoiding the need for a human-in-the-loop analysis of equilibria investigations of theoretical properties of graph-based games
for classifying the games themselves. analysis, for further scientific progress on the Problem Problem
While our graph-based game analysis approach (i.e., the and task theory, and further links to related works investigating
spectral analysis and clustering technique) applies to general-sum, the geometry and structure of games37,40,42,85,101,121.
many-player games, the procedural game structure generation
approach used in our experiments is limited to zero-sum games
(due to our use of the mElo parameterization). However, any Methods
other general-sum payoff parameterization approaches (e.g., even Games. Our work applies to K-player, general-sum games, wherein each player
k ∈ [K] has a finite set Sk of pure strategies. The space of pure strategy profiles
direct generation of the payoff entries) can also be used to avoid is denoted S = ∏kSk, where a specific pure strategy profile instance is denoted s =
the zero-sum constraint. The study of normal-form games con- (s1, …, sK) ∈ S. For a give profile s ∈ S, the payoffs vector is denoted MðsÞ ¼
tinues to play a prominent role in the game theory and machine ðM1 ðsÞ; ¼ ; MK ðsÞÞ 2 RK , where Mk(s) is the payoff for each player k ∈ [k]. We
learning literature102–108 and as such the procedural generation of denote by s−k the profile of strategies used by all but the k-th player. A game is said
to be zero-sum if ∑kMk(s) = 0 for all s ∈ S. A game is said to be symmetric if all
normal-form games can play an important role in the research players have the same strategy set, and Mk(s1, …, sK) = Mρ(k)(sρ(1), …, sρ(K)) for all
community. An important line of future work will involve permutations ρ, strategy profiles s, and player indices k ∈ [K].
investigating means of generalizing this approach to generation of
more complex classes of games. Specifically, one way to generate
more complex underlying games would be to parameterize core Empirical games. For the real-world games considered (e.g., Go, Tic–Tac–Toe, etc.),
mechanisms of such a games class (either explicitly, or via we conduct our analysis using an empirical game-theoretic approach67,68,122–124.
Specifically, rather than consider the space of all pure strategies in the game (which
mechanism discovery, e.g., using a technique such as that can be enormous, even in the case of, e.g., Tic–Tac–Toe), we construct an empirical
described by Charity et al.88). Subsequently, one could train AI game over meta-strategies, which can be considered higher-level strategies over
agents on a population of such games, constructing a corre- atomic actions. In empirical games, a meta-strategy sk for each player k corresponds to
sponding empirical game, and using the response graph-based a sampled policy (e.g., in the case of our real-world games examples), or an AI agent
(e.g., in our study of AlphaGo, where each meta-strategy was a specific variant of
techniques used here to analyze the space of such games (e.g., in a AlphaGo). Empirical game payoffs are calculated according to the win/loss ratio of
manner reminiscent of Wang et al.35, though under our graph- these meta-strategies against one another, over many trials of the underlying games.
theoretic lens). The connection to the SBPCG literature men- From a practical perspective, game-theoretic analysis applies to empirical games (over
tioned in The Problem Problem and Procedural Game Structure agents) in the same manner as standard games (over strategies); thus, we consider
strategies and agents as synonymous in this work. Overall, empirical games provide a
Generation section also offers an avenue of alternative investi- useful abstraction of the underlying game that enables the study of significantly larger
gations into game structure generation, as the variations of fitness and more complex interactions.
Finite population models and α-Rank. In game theory125–129, one often seeks transition probabilities between nodes. Omidshafiei et al.50 define α-Rank π ∈
algorithms or models for evaluating and training strategies with respect to one Δ∣S∣−1 as a probability distribution over the strategy profiles S, by ordering the
another (i.e., models that produce a score or ranking over strategy profiles, or an masses of the stationary distribution of E (i.e., solution of the eigenvalue problem
equilibrium over them). As a specific example, the Double Oracle algorithm73, πTE = πT). Effectively, the α-Rank distribution quantifies the average amount of
which is used to quantify the computational complexity of solving games in some time spent by the players in each profile s ∈ S under the associated discrete-time
of our experiments, converges to Nash equilibria, albeit only in two-player zero- evolutionary population model117. Our proposed methodology uses the α-Rank
sum games. More recently, a line of research has introduced and applied the α- response graphs in a more refined manner, quantifying the structural properties
Rank algorithm49–51 for evaluation of strategies in general-sum, n-player many- defining the underlying game, as detailed in the workflow outlined in Fig. 3 and, in
strategy games. α-Rank leverages notions from stochastic evolutionary dynamics in more detail, below.
finite populations118,130–134 in the limit of rare mutations117,120,135, which are
subsequently analyzed to produce these scalar ratings (one per strategy or agent).
Spectral, clustered, and contracted response graphs. This section details the
At a high level, α-Rank models the probability of a population transitioning from a workflow used to for spectral analysis of games’ response graphs (i.e., the steps
given strategy to a new strategy, by considering the additional payoff the popu-
visualized in Fig. 3c to e). Response graphs are processed in two stages: (i) sym-
lation would receive via such a deviation. These evolutionary relations are con-
metrization (i.e., transformation of the directed response graphs to an associated
sidered between all strategies in the game, and are summarized in its so-called
undirected graph), and (ii) subsequent spectral analysis. This two-phase approach
response graph. α-Rank then uses the stationary distribution over this response
is a standard technique for analysis of directed graphs, which has proved effective
graph to quantify the long-term propensity of playing each of the strategies,
in a large body of prior works (see Malliaros and Vazirgiannis137, Van Lierde138 for
assigning a scalar score to each.
comprehensive surveys). In addition, spectral analysis of the response graph is
Overall, α-Rank yields a useful representation of the limiting behaviors of the
closely-associated with the eigenvalue analysis required when solving for the α-
players, providing a summary of the characteristics of the underlying game–albeit a
Rank distribution, establishing a shared formalism of our techniques with those of
1-dimensional one (a scalar rating per strategy profile). In our work, we exploit the
prior works.
higher-dimensional structural properties of the α-Rank response graph, to make
Let A denote the adjacency matrix of the response graph G, where A = E as G is
more informed characterizations of the underlying game, rather than compute a directed weighted graph. We seek a transformation such that response graph
scalar rankings.
strategies with similar relationships to neighboring strategies tend to have higher
adjacency with one another. Bibliometric symmetrization139 provides a useful
Response graphs. The α-Rank response graph50 provides the mathematical model means to do so in application to directed graphs, whereby the symmetrized
that underpins our analysis. It constitutes an analog (yet, not equivalent) model of adjacency matrix is defined A e ¼ AAT þ AT A. Intuitively, in the first term, AAT,
the invasion graphs used to describe the evolution dynamics in finite populations in the (s, σ)-th entry captures the weighted number of other strategies that both s and
the limit when mutations are rare (see, e.g., Fudenberg and Imhof117, Hauert σ would deviate to in the response graph G; the same entry in the second term,
et al.136, Van Segbroeck et al.120, Vasconcelos et al.119). In this small-mutation ATA, captures the weighted number of other strategies that would deviate to both s
approximation, directed edges stand for the fixation probability131 of a single and σ. Hence, this symmetrization captures the relationship of each pair of
mutant in a monomorphic population of resident individuals (the vertices), such response graph nodes (s, σ) with respect to all other nodes, ensuring high values of
that all transitions are computed through a processes involving only two strategies weighted adjacency when these strategies have similar relational roles with respect
at a time. Here, we use a similar approach. Let us consider a pure strategy profile s to all other strategies in the game. More intuitively, this ensures that in games such
= (s1, …, sK). Consider a unilateral deviation (corresponding to a mutation) of as Redundant Rock–Paper–Scissors (see Fig. 5, first row), sets of redundant
player k from playing sk ∈ Sk to a new strategy σk ∈ Sk, thus resulting in a new strategies are considered to be highly adjacent to each other.
profile σ = (σk, s−k). The response graph associated with the game considers all Following bibliometric symmetrization of the response graph, clustering
such deviations, defining transition probabilities between all pairs of strategy proceeds as follows. Specifically, for any partitioning of the strategy profiles S into
P
profiles involving a unilateral deviation. Specifically, let Es,σ denote the transition sets S1 ⊂ S and S1 ¼ S n S1 , define wðS1 ; S1 Þ ¼ s2S1 ;σ2S1 Es;σ . Let the sets of disjoint
probability from s to σ (where the latter involves a unilateral deviation), defined as strategy profiles fSk gk2½K partition S (i.e., ⋃k∈[K]Sk = S). Define the K-cut of graph
8
< η 1expðαðM ðσÞ M ðsÞÞÞ if Mk ðσÞ ≠ Mk ðsÞ
k k
G under partitions fSk gk2½K as
Es;σ ¼ 1expðαmðMk ðσÞ Mk ðsÞÞÞ ð1Þ X
:η cutðfSk gÞ ¼ wðSk ; Sk Þ ; ð3Þ
m otherwise;
k
where η is a normalizing factor denoting the reciprocal of the total number of which, roughly speaking, measures the connectedness of points in each cluster; i.e.,
P 1
unilateral deviations from a given strategy profile, i.e., η ¼ ð Kl¼1 ðjSl j 1ÞÞ . a low cut indicates that points across distinct clusters are not well-connected. A
Furthermore, α≥0 and m 2 N are parameters of the underlying evolutionary standard technique for cluster analysis of graphs is to choose the set of K partitions,
model considered and denote, respectively, to the so-called selection pressure and fSk gk2½K , which minimizes (Eq. 3). In certain situations, balanced clusters (i.e.,
population size. clusters with similar numbers of nodes) may be desirable; here, a more suitable
To further simplify the model and avoid sweeps over these parameters, we metric is the so-called normalized K-cut, or Ncut, of graph G under partitions
consider here the limit of infinite-α introduced by Omidshafiei et al.50, which fSk gk2½K ,
specifies transitions from lower-payoff profiles to higher-payoff ones with
probability η(1 − ε), the reverse transition with probability ηε, and transition X wðS ; S Þ
NcutðfSk gÞ ¼ k k
: ð4Þ
between strategies of equal payoff with probability η2, where 0 < ε ≪ 1 is a small wðSk ; SÞ
k
perturbation factor. We use ε = 1e − 10 in our experiments, and found low
sensitivity of results to this choice given a sufficiently small value. For further Unfortunately, the minimization problem associated with Eq. (4) is NP-hard even
theoretical exposition of α-Rank under this infinite-α regime, see Rowland et al.51. when K = 2 (see Shi and Malik140). A typical approach is to consider a spectral
Given the pairwise strategy transitions defined as such, the self-transition relaxation of this minimization problem, which corresponds to a generalized
probability of s is subsequently defined as, eigenvalue problem (i.e., efficiently solved via standard linear algebra); interested
X readers are referred to Shi and Malik140, Van Lierde138 for further exposition.
Es;s ¼ 1 Es;σ : Define the Laplacian matrix L ¼ D A e (respectively, L ¼ I D12 AD
e 12 ), where
k 2 ½K P
ð2Þ e
degree matrix D has diagonal entries Di;i ¼ j Ai;j , and zeroes elsewhere. Then the
σjσ k 2 Sk n fsk g eigenvectors associated with the lowest nonzero eigenvalues of L provide the
desired spectral projection of the datapoints (i.e., spectral response graph), with the
As mentioned earlier, if two strategy profiles s and σ do not correspond to a desired number of projection dimensions corresponding to the number of
unilateral deviation (i.e., differ in more than one player’s strategy), no transition
occurs between them under this model (i.e., Es,σ = 0). eigenvectors kept. We found that using the unnormalized graph Laplacian L ¼
DA e yielded intuitive projections in our experiments, which we visualize 2-
The transition structure above is informed by particular models in evolutionary
dynamics as explained in detail in Omidshafiei et al.50. The introduction of the dimensionally in our results (e.g., see Fig. 3c).
perturbation term ε effectively ensures the ergodicity of the associated Markov The relaxed clustering problem detailed above is subsequently solved by
chain with row-stochastic transition matrix E. This transition structure then application of a standard clustering algorithm to the spectral-projected graph
enables definition of the α-Rank response graph of a game. nodes. Specifically, we use agglomerative average-linkage clustering in our
experiments (see Rokach and Maimon141, Chapter 15 for details). For determining
the appropriate number of clusters, we use the approach introduced by Pham
Definition 1 et al.142, which we found to yield more intuitive clusterings than the gap statistic143
(Response graph) The response graph of a game is a weighted directed graph (digraph) for the games considered.
G = (S, E) where each node corresponds to a pure strategy profile s ∈ S, and each Following computation of clustered response graphs (e.g., Fig. 3d), we contract
weighted edge Es,σ quantifies the probability of transitioning from profile s to σ. clustered nodes (summing edge probabilities accordingly), as in Fig. 3e. Note that
For example, the response graph associated with a transitive game is visualized for clarity, our visualizations only show edges corresponding to transitions from
in Fig. 3b, where each node corresponds to a strategy s, and directed edges indicate lower-payoff to higher-payoff strategies in the standard, spectral, and clustered
response graphs, as these bear the majority of transition mass between nodes; Data availability
reverse edges (from higher-payoff to lower-payoff nodes) and self-transitions are We use OpenSpiel146 as the backend providing many of the games and associated payoff
not visually indicated, despite being used in the underlying spectral clustering. The datasets studied here (see Supplementary Methods 1 for details). Payoff datasets for
exception is for contracted response graphs, where we do visualize weighted edges empirical games in the literature are referenced in the main text.
from higher-payoff to lower-payoff nodes; this is due to the node contraction
process potentially yielding edges with non-negligible weight in both directions.
Code availability
We use OpenSpiel146 for the implementation of α-Rank and Double Oracle.
30. Nelson, M. J. & Mateas, M. Towards automated game design. In Proc. 64. Hausmann, R. et al. The Atlas of Economic Complexity: Mapping Paths to
Congress of the Italian Association for Artificial Intelligence (Springer, 2007). Prosperity. (MIT Press, 2014).
31. Risi, S. & Togelius, J. Increasing generality in machine learning through 65. Tacchella, A., Cristelli, M., Caldarelli, G., Gabrielli, A. & Pietronero, L.
procedural content generation. Nat. Mach. Intel. 2, 428–436 (2020). Economic complexity: conceptual grounding of a new metrics for global
32. Shaker, N., Togelius, J. & Nelson, M. J. Procedural Content Generation in competitiveness. J. Econ. Dyn. Control 37, 1683–1691 (2013).
Games. (Springer, 2016). 66. Vitevitch, M. S. What can graph theory tell us about word learning and lexical
33. Smith, A. M. & Mateas, M. Answer set programming for procedural content retrieval? J. Speech. Hear. Res. 51, 408 (2008).
generation: a design space approach. IEEE T. Comp. Intel. AI 3, 187–200 67. Walsh, W. E., Das, R., Tesauro, G. & Kephart, J. O. Analyzing complex
(2011). strategic interactions in multi-agent games. In AAAI Workshop on Game
34. Togelius, J., Yannakakis, G. N., Stanley, K. O. & Browne, C. Search-based Theoretic and Decision Theoretic Agents (AAAI Press, 2002).
procedural content generation: a taxonomy and survey. IEEE T. Comp. Intel. 68. Wellman, M. P. Methods for empirical game-theoretic analysis. In Proc. AAAI
AI 3, 172–186 (2011). Conference on Artificial Intelligence (AAAI Press, 2006).
35. Wang, R., Lehman, J., Clune, J. & Stanley, K. O. Paired open-ended trailblazer 69. Bowling, M., Burch, N., Johanson, M. & Tammelin, O. Heads-up limit
(POET): Endlessly generating increasingly complex and diverse learning holdaem poker is solved. Science 347, 145–149 (2015).
environments and their solutions. Preprint at https://arxiv.org/abs/1901.01753 70. Burch, N., Johanson, M. & Bowling, M. Solving imperfect information games
(2019). using decomposition. In Proc. AAAI Conference on Artificial Intelligence.
36. Wang, R. et al. Enhanced POET: Open-ended reinforcement learning through (AAAI Press, 2014).
unbounded invention of learning challenges and their solutions. Preprint at 71. Robinson, J. An iterative method of solving a game. Ann. Math. 54, 296–301
https://arxiv.org/abs/2003.08536 (2020). (1951).
37. Czarnecki, W. M. et al. Real world games look like spinning tops. In Proc. 72. Waugh, K., Morrill, D., Bagnell, J. A. & Bowling, M. Solving games with
Neural Inf. Process. Syst. Preprint at https://arxiv.org/abs/2004.09468 (2020). functional regret estimation. In Proc. AAAI Conference on Artificial
38. Bruns, B. R. Names for games: locating 2 × 2 games. Games 6, 495–520 (2015). Intelligence (AAAI Press, 2015).
39. Liebrand, W. B. G. A classification of social dilemma games. Sim Games 14, 73. McMahan, H. B., Gordon, G. J. & Blum, A. Planning in the presence of cost
123–138 (1983). functions controlled by an adversary. In Proc. International Conference on
40. Rapoport, A. & Guyer, M. A taxonomy of 2 × 2 games. Gen. Sys. 11, 203–214 Machine Learning (PMLR, 2003).
(1966). 74. Bosansky, B., Jiang, A. X., Tambe, M. & Kiekintveld, C. Combining compact
41. Balduzzi, D., Tuyls, K., Perolat, J. & Graepel, T. Re-evaluating evaluation. In representation and incremental generation in large games with sequential
Proc. Neural Information Processing Systems (Curran Associates Inc., 2018). strategies. In Proc. AAAI Conference on Artificial Intelligence (AAAI Press,
42. Balduzzi, D. et al. Open-ended learning in symmetric zero-sum games. In 2015).
Proc. International Conference on Machine Learning (PMLR, 2019). 75. Jain, M. et al. A double oracle algorithm for zero-sum security games on
43. Bellemare, M. G., Naddaf, Y., Veness, J. & Bowling, M. The arcade learning graphs. In Proc. Autonomous Agents and Multi-Agent Systems (Springer
environment: an evaluation platform for general agents. J. Artif. Intel. Res. 47, Science & Business Media, 2011).
253–279 (2013). 76. Regan, K. & Boutilier, C. Regret-based reward elicitation for Markov decision
44. Hernández-Orallo, J. The Measure of All Minds: Evaluating Natural and processes. In Proc. Association for Uncertainty in Artificial Intelligence (AUAI
Artificial Intelligence. (Cambridge University Press, 2017). Press, 2009).
45. Hernández-Orallo, J. Evaluation in artificial intelligence: from task-oriented to 77. Singh, S. P., Kearns, M. J. & Mansour, Y. Nash convergence of gradient
ability-oriented measurement. Artif. Intel. Rev. 48, 397–447 (2017). dynamics in general-sum games. In Proc. Association for Uncertainty in
46. Hernández-Orallo, J., Flach, P. A. & Ferri, C. A unified view of performance Artificial Intelligence (AUAI Press, 2000).
metrics: translating threshold choice into expected classification loss. J. Mach. 78. Mohar, B. The Laplacian spectrum of graphs. Graph Theo. Comb. Appl 2, 12
Learn. Res. 13, 2813–2869 (2012). (1991).
47. Lanctot, M. et al. A unified game-theoretic approach to multiagent 79. Arad, A. & Rubinstein, A. The 11-20 money request game: a level-k reasoning
reinforcement learning. In Proc. Neural Information Processing Systems study. Am. Econ. Rev. 102, 3561–73 (2012).
(Curran Associates Inc., 2017). 80. Liu, S. et al. Emergent coordination through competition. In Proc.
48. Machado, M. C. et al. Revisiting the arcade learning environment: evaluation International Conference on Learning Representations (OpenReview, 2019).
protocols and open problems for general agents. J. Artif. Intel. Res. 61, 81. Borel, E. La théorie du jeu et les équations intégralesa noyau symétrique.
523–562 (2018). Comp. Rend. de. la Acad. Sci. 173, 58 (1921).
49. Muller, P. et al. A generalized training approach for multiagent learning. In 82. Newell, A. & Simon, H. A. Computer Science as Empirical Inquiry: Symbols
Proc. International Conference on Learning Representations (OpenReview, and Search. (ACM, 1976).
2020). 83. Coulom, R. Efficient selectivity and backup operators in Monte-Carlo tree
50. Omidshafiei, S. et al. α -Rank: multi-agent evaluation by evolution. Sci. Rep. 9, search. In Proc. International Conference on Computer Games (2006).
1–29 (2019). 84. Berlingerio, M., Koutra, D., Eliassi-Rad, T. & Faloutsos, C. NetSimile: a
51. Rowland, M. et al. Multiagent evaluation under incomplete information. In scalable approach to size-independent network similarity. Preprint at https://
Proc. Neural Information Processing Systems (Curran Associates Inc., 2019). arxiv.org/abs/1209.2684 (2012).
52. Balduzzi, D. et al. The mechanics of n-player differentiable games. In Proc. 85. Robinson, D. & Goforth, D. The Topology Of The 2x2 Games: A New Periodic
International Conference on Machine Learning (PMLR, 2018). Table, Vol. 3. (Psychology Press, 2005).
53. Tuyls, K. Perolat, J., Lanctot, M., Leibo, J. Z. & Graepel, T. A generalised 86. Shaker, M., Sarhan, M. H., Al Naameh, O., Shaker, N. & Togelius, J.
method for empirical game theoretic analysis. In Proc. Autonomous Agents Automatic generation and analysis of physics-based puzzle games. In
and Multi-Agent Systems (Springer Science & Business Media, 2018). Proc. IEEE International Conference on Computational Intelligence (IEEE,
54. Boccaletti, S., Latora, V., Moreno, Y., Chavez, M. & Hwang, D.-U. Complex 2013).
networks: structure and dynamics. Phys. Rep. 424, 175–308 (2006). 87. Smith, G. & Whitehead, J. Analyzing the expressive range of a level generator.
55. Dehmer, M. Structural Analysis of Complex Networks. (Springer Science & In Proc. Procedural Content Generation For Games (ACM, 2010).
Business Media, 2010). 88. Charity, M., Green, M. C., Khalifa, A. & Togelius, J. Mech-elites: Illuminating
56. Van Steen, M. Graph Theory and Complex Networks: An Introduction. (Self- the mechanic space of gvgai. Preprint at https://arxiv.org/abs/2002.04733
published, 2010). (2020).
57. Scott, J. Social network analysis. Sociol 22, 109–127 (1988). 89. Deterding, S. The lens of intrinsic skill atoms: a method for gameful design.
58. Wasserman, S. & Faust, K. Social Network Analysis: Methods and Applications. Hum. Comput. Interact. 30, 294–335 (2015).
(Cambridge University Press, 1994). 90. Hao, W. & Chuen-Tsai, S. Game reward systems: gaming experiences and
59. Donato, D., Laura, L., Leonardi, S. & Millozzi, S. Large scale properties of the social meanings. In Proc. DiGRA International Conference (Springer, 2011).
webgraph. Eur. Phys. J. B 38, 239–243 (2004). 91. Koster, R. Theory of Fun for Game Design. (O’Reilly Media, Inc., 2013).
60. Georgeot, B., Giraud, O. & Shepelyansky, D. L. Spectral properties of the 92. Lazzaro, N. Why we play: affect and the fun of games. Hum. Comput. Interact.
Google matrix of the world wide web and other directed networks. Phys. Rev. 155, 679–700 (2009).
E 81, 056109 (2010). 93. Prensky, M. Fun, play and games: what makes games engaging. Dig. Game-
61. Bonchev, D. D. & Rouvray, D. Complexity in Chemistry, Biology, and Ecology. based Learn. 5, 5–31 (2001).
(Springer Science & Business Media, 2007). 94. Vygotsky, L. Interaction between learning and development. Read. Dev. Child
62. Lesne, A. Complex networks: from graph theory to biology. Lett. Math. Phys. 23, 34–41 (1978).
78, 235–262 (2006). 95. Baker, B. et al. Emergent tool use from multi-agent autocurricula. In Proc.
63. Pavlopoulos, G. A. et al. Using graph theory to analyze biological networks. Int. Conf. Learn. Represent. https://openreview.net/forum?id=SkxpxJBKwS
BioData Min. 4, 10 (2011). (2020).
96. Nielsen, T. S., Barros, G. A.B., Togelius, J. & Nelson, M. J. Towards generating 128. Sandholm, W. H. Population games and evolutionary dynamics. In Economic
arcade game rules with VGDL. In Proc. IEEE Conference on Computational Learning and Social Evolution. (MIT Press, 2010).
Intelligence (IEEE, 2015). 129. Weibull, J. W. Evolutionary Game Theory. (MIT Press, 1995).
97. Byde, A., Applying evolutionary game theory to auction mechanism design. In 130. Nowak, M. A. & Sigmund, K. Evolutionary dynamics of biological games.
Proc. International Conference on E-Commerce. (IEEE, 2003). Science 303, 793–799 (2004).
98. Hom, V. & Marks, J. Automatic design of balanced board games. In Proc. 131. Nowak, M. A., Sasaki, A., Taylor, C. & Fudenberg, D. Emergence of
AAAI Conference on Artificial Intelligence and Interactive Digital cooperation and evolutionary stability in finite populations. Nature 428,
Entertainment (AAAI Press, 2007). 646–650 (2004).
99. Yannakakis, G. N. & Hallam, J. Evolving opponents for interesting interactive 132. Taylor, C., Fudenberg, D., Sasaki, A. & Nowak, M. A. Evolutionary game
computer games. (2004). dynamics in finite populations. B. Math. Biol. 66, 1621–1644 (2004).
100. Hansen, N. The CMA evolution strategy: a comparing review. In Towards A 133. Traulsen, A., Nowak, M. A. & Pacheco, J. M. Stochastic dynamics of invasion
New Evolutionary Computation. 75–102. (Springer, 2006). and fixation. Phys. Rev. E 74, 011909 (2006).
101. Crandall, J. W. et al. Cooperating with machines. Nat. Commun. 9, 1–12 134. Traulsen, A., Pacheco, J. M. & Imhof, L. A. Stochasticity and evolutionary
(2018). stability. Phys. Rev. E 74, 021905 (2006).
102. Bloembergen, D., Tuyls, K., Hennes, D. & Kaisers, M. Evolutionary dynamics 135. Veller, C. & Hayward, L. K. Finite-population evolution with rare mutations
of multi-agent learning a survey. J. Artif. Intel. Res. 53, 659–697 (2015). in asymmetric games. J. Econ. Theory 162, 93–113 (2016).
103. de Witt, C. S. et al. Multi-agent common knowledge reinforcement learning. 136. Hauert, C., Traulsen, A., Brandt, H., Nowak, M. A. & Sigmund, K. Via
In Proc. of the Conference on Neural Information Processing Systems (Curran freedom to coercion: The emergence of costly punishment. Science 316,
Associates Inc., 2019). 1905–1907 (2007).
104. Durugkar, I., Liebman, E. & Stone, P. Balancing individual preferences and 137. Malliaros, F. D. & Vazirgiannis, M. Clustering and community detection in
shared objectives in multiagent reinforcement learning. In International Joint directed networks: a survey. Phys. Rep. 533, 95–142 (2013).
Conference on Artificial Intelligence (IJCAI, 2020). 138. Van Lierde, H. Spectral clustering algorithms for directed graphs. Master’s
105. Leibo, J. Z., Zambaldi, V., Lanctot, M., Marecki, J. & Graepel, T. Multi-agent thesis, Universite Catholique de Louvain. (2015).
reinforcement learning in sequential social dilemmas. In Procceding of the 139. Satuluri V. and Parthasarathy, S. Symmetrizations for clustering directed
Autonomous Agents and Multi-Agent Systems (Springer Science & Business graphs. In Proc. International Conference on Extending Database Technology.
Media, 2017). (ACM, 2011).
106. Li, Z. & Wellman, M. P. Structure learning for approximate solution of many- 140. Shi, J. & Malik, J. Normalized cuts and image segmentation. IEEE T. Pattern
player games. In Proc. AAAI Conference on Artificial Intelligence (AAAI Press, Anal. 22, 888–905 (2000).
2020). 141. Rokach, L. & Maimon, O. Clustering methods. In Data Min. Knowl. Disc.
107. Spooner, T. & Savani, R. Robust market making via adversarial reinforcement Hand. 321–352 (Springer, 2005).
learning. In International Joint Conferences on Artificial Intelligence (IJCAI, 142. Pham, D. T., Dimov, S. S. & Nguyen, C. D. Selection of K in K -means
2020). clustering. P. I. Mech. Eng. C.-J. Mec. 219, 103–119 (2005).
108. Wright, M., Wang, Y. & Wellman, M. P. Iterated deep reinforcement learning 143. Tibshirani, R., Walther, G. & Hastie, T. Estimating the number of clusters in a
in games: history-aware training for improved stability. (2019). data set via the gap statistic. J. R. Stats. Soc. B 63, 411–423 (2001).
109. Feldman, J. Minimization of boolean complexity in human concept learning. 144. Elo, A. The Rating of Chess players, Past and Present. (Ishi Press International,
Nature 407, 630–633 (2000). 1978).
110. Santos, F. P., Santos, F. C. & Pacheco, J. M. Social norm complexity and past 145. Hansen, N., Akimoto, Y. & Baudis, P. CMA-ES/pycma on Github. Zenodo,
reputations in the evolution of cooperation. Nature 555, 242–245 (2018). https://doi.org/10.5281/zenodo.2559634 (2019).
111. May, R. M., Levin, S. A. & Sugihara, G. Ecology for bankers. Nature 451, 146. Lanctot, M. et al. OpenSpiel: a framework for reinforcement learning in
893–894 (2008). games. Preprint at https://arxiv.org/abs/1908.09453 (2019).
112. Miller, J. H. & Page, S. E. Complex Adaptive Systems: an Introduction to
Computational Models of Social Life. (Princeton University Press, 2009).
113. Scheffer, M. Critical Transitions in Nature and Society, Vol. 16. (Princeton Acknowledgements
University Press, 2009). The authors gratefully thank Marc Lanctot and Thore Graepel for insightful feedback on
114. Sigmund, K. The Calculus of Selfishness, Vol. 6 (Princeton University Press, the paper. F.C.S. acknowledges the support of FCT-Portugal through grants PTDC/
2010). MAT/STA/3358/2014 and UIDB/50021/2020. The authors thank the three anonymous
115. Smith, J. M. & Smith, J. M. M. Evolution and the Theory of Games. reviewers and editor for their constructive and insightful feedback, which helped to
(Cambridge University Press, 1982). significantly improve the clarity of the proposed method, experimental results, and
116. Donahue, K., Hauser, O. P., Nowak, M. A. & Hilbe, C. Evolving cooperation in discussions thereof.
multichannel games. Nat. Commun. 11, 1–9 (2020).
117. Fudenberg, D. & Imhof, L. A. Imitation processes with small mutations. J. Author contributions
Econ. Theory 131, 251–262 (2006). S.O., K.T., and F.C.S. conceptualized the graph-based framework used in the paper. S.O.
118. Imhof, L. A., Fudenberg, D. & Nowak, M. A. Evolutionary cycles of implemented and generated the experimental results (spectral analysis of response
cooperation and defection. Proc. Natl Acad. Sci. USA 102, 10797–10800 graphs, clusterings, landscape of games, procedural games generation). W.M.C. devised
(2005). and implemented the policy sampling scheme for real world games, and generated the
119. Van Segbroeck, S., Pacheco, J. M., Lenaerts, T. & Santos, F. C. Emergence payoff tables for the games listed in Fig. 1. S.O., K.T., W.M.C., F.C.S., M.R., J.C., D.H.,
of fairness in repeated group interactions. Phys. Rev. Lett. 108, 158104 P.M., J.P., B.D.V., A.G., and R.M. contributed actively to discussions, analysis, writing,
(2012). and review of the paper.
120. Vasconcelos, V. V., Santos, F. P., Santos, F. C. & Pacheco, J. M. Stochastic
dynamics through hierarchically embedded Markov chains. Phys. Rev. Lett.
118, 058301 (2017). Competing interests
121. Huertas-Rosero, A. F. A cartography for 2 × 2 symmetric games. Preprint at The authors declare no competing interests.
arXiv cs/0312005 (2004).
122. Phelps, S., Parsons, S. & McBurney, P. An evolutionary game-theoretic
comparison of two double-auction market designs. In AAMAS Workshop on
Additional information
Supplementary information is available for this paper at https://doi.org/10.1038/s41467-
Agent-Mediated Electronic Commerce (Springer Science & Business Media,
020-19244-4.
2004).
123. Tuyls, K. et al. Bounds and dynamics for empirical game theoretic analysis.
Correspondence and requests for materials should be addressed to S.O.
Proc. Auton. Agent Multi-Ag. 34, 7 (2020).
124. Wellman, M. P., Kim, T. H. & Duong, Q. Analyzing incentives for protocol
Peer review information Nature Communications thanks the anonymous reviewers for
compliance in complex domains: a case study of introduction-based routing.
their contribution to the peer review of this work. Peer reviewer reports are available.
In Workshop on the Economics of Information Security (WEIS, 2013).
125. Gintis, H. Game Theory Evolving. 2nd edn. (Princeton University Press, 2009).
Reprints and permission information is available at http://www.nature.com/reprints
126. Hofbauer, J. & Sigmund, K. Evolutionary games and population dynamics.
(Cambridge University Press, 1998).
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in
127. Nash, J. F. Equilibrium points in n -person games. Proc. Natl Acad. Sci. USA
published maps and institutional affiliations.
36, 48–49 (1950).