Collective Intelligence Deep Learning

PREPRINT
Collective Intelligence for

arXiv:2111.14377v3 [cs.NE] 10 Mar 2022
Deep Learning: A Survey of

Recent Developments
David Ha1 and Yujin Tang1
Abstract
In the past decade, we have witnessed the rise of deep learning to dominate the field of artificial
intelligence. Advances in artificial neural networks alongside corresponding advances in hardware
accelerators with large memory capacity, together with the availability of large datasets enabled
practitioners to train and deploy sophisticated neural network models that achieve state-of-the-art
performance on tasks across several fields spanning computer vision, natural language processing,
and reinforcement learning. However, as these neural networks become bigger, more complex,
and more widely used, fundamental problems with current deep learning models become more
apparent. State-of-the-art deep learning models are known to suffer from issues that range from
poor robustness, inability to adapt to novel task settings, to requiring rigid and inflexible configuration
assumptions. Collective behavior, commonly observed in nature, tends to produce systems that are
robust, adaptable, and have less rigid assumptions about the environment configuration. Collective
intelligence, as a field, studies the group intelligence that emerges from the interactions of many
individuals. Within this field, ideas such as self-organization, emergent behavior, swarm optimization,
and cellular automata were developed to model and explain complex systems. It is therefore natural
to see these ideas incorporated into newer deep learning methods. In this review, we will provide
a historical context of neural network research’s involvement with complex systems, and highlight
several active areas in modern deep learning research that incorporate the principles of collective
intelligence to advance its current capabilities. We hope this review can serve as a bridge between
the complex systems and deep learning communities.
Keywords
Deep Learning, Reinforcement Learning, Cellular Automata, Self-Organization, Complex Systems
1 Google Brain, Tokyo, Japan.
Both authors contributed equally to this work.

Email: [email protected], [email protected]
2 Collective Intelligence (1)
Introduction
Deep learning (DL) is a class of machine learning methods that uses multi-layer (“deep”) neural networks
for representation learning. While artificial neural networks, trained with the backpropagation algorithm,
first appeared in the 1980s 69 , deep neural networks did not receive widespread attention until 2012 when a
deep artificial neural network solution trained on GPUs 39 won an annual image recognition competition 16
by a significant margin over the non-DL runner up methods. This success demonstrated that DL,
when combined with fast hardware-accelerated implementations and the availability of large datasets,
is capable of achieving exceptionally better results in non-trivial tasks than conventional methods.
Practitioners soon quickly incorporated DL to address the long-standing problems in several other
fields. In computer vision (CV), deep learning models are used in image recognition 27,57,75 and image
generation 33,92 . In natural language processing (NLP), deep language models can generate text 7,58,59 and
perform machine translation 76 . Deep learning has also been incorporated into reinforcement learning
(RL) to tackle vision-based computer games such as Doom 25 and Atari 46 , and play games with
large search spaces such as Go 74 and Starcraft 90 . Deep learning models are also deployed for mobile
applications like speech recognition 1 and speech synthesis 83 , demonstrating their wide applicability.
Figure 1. Recent advances in GPU hardware enables realistic 3D simulation of thousands of robot models 28 ,
such as the one shown in this figure from Rudin et al. 63 . Such advances opens the door for large scale 3D
simulation of artificial agents that can interact with each other and collectively develop intelligent behavior.
However, DL is not an elixir without side effects. While we are witnessing many successes and a
growing adoption of deep neural networks, fundamental problems with DL are also revealing themselves
more and more clearly as our models and training algorithms become bigger and more complex. DL
models are not robust in some cases. For example, it is now known that by simply modifying several
pixels on the screen of a video game (the modification is not even noticeable to humans), the agent
trained with unmodified screens that originally surpassed human performance could fail 56 . Also, CV
models trained without special treatment may fail to recognize rotated or similarly transformed examples,
in other words, our current model and training methods do not lend themselves to generalization to novel
task settings. Last but not least, most DL models do not adapt to changes. They make assumptions about
input and expect rigid configurations and stationarity of the environment, what statisticians think of as the
Ha and Tang 3
data generating process. For instance, they may expect a fixed number of inputs, in a determined ordered.
We cannot expect agents to capably act beyond their skills learned during training, but once these rigid
configurations are violated, the models do not perform well unless we retrain them or manually process
the inputs to be consistent with the expectations of their initial training configurations.
Figure 2. Neural network architecture of AlexNet 39 , the winner of the ImageNet competition in 2012.
Furthermore, with all these advances, the impressive feats in deep learning involve sophisticated
engineering efforts. For instance, the famous AlexNet 39 (See Figure 2), which put deep learning into
the spotlight in the computer vision community after winning ImageNet in 2012, presented a carefully
designed network architecture with a well-calibrated training procedure. Modern neural networks are
often even more sophisticated, and require a pipeline that spans network architecture to delicate training
schemes. Like many engineering projects, much labor and fine-tuning went into producing each result.
We believe that many of the limitations and side effects of deep learning stems from the fact that the
current practice of deep learning is similar to the practice of engineering. The way we are building modern
neural network systems is similar to the way we are building bridges and buildings, which are designs
that are not adaptive. To quote Pickering, author of The Cybernetic Brain 54 : “Most of the examples of
engineering that come to mind are not adaptive. Bridges and buildings, lathes and power presses, cars,
televisions, computers, are all designed to be indifferent to their environment, to withstand fluctuations,
not to adapt to them. The best bridge is one that just stands there, whatever the weather.”
Figure 3. Left: Trajan’s Bridge at Alcantara, built in 106 AD by Romans 2 . Right: Army ants forming a bridge 35 .
In natural systems, where collective intelligence plays a big role, we see adaptive designs that emerge
due to self-organization, and such designs are very sensitive and responsive to changes in the world
around them. Natural systems adapt, and become part of their environment (See Figure 3 for an analogy).
As exemplified by the example of army ants collectively forming a bridge that adapts to its
environment, collective behavior, commonly seen in nature, tends to produce systems that are adaptable,
robust, and have less rigid assumptions about the environment configuration. Collective intelligence, as a
field, studies the shared intelligence that emerges from the interactions (such as collaboration, collective
efforts, and competition) of many individuals. Within this field, ideas such as self-organization, emergent
behavior, swarm optimization, and cellular automata were developed to model and explain complex
systems. It is therefore natural to see these ideas incorporated into newer deep learning methods.
We do not believe that deep learning models have to be built in the same vein as bridges. As
we will discuss later on, it didn’t have to be this way. The reason why the deep learning field took
this course could just be an accidental outcome in history. In fact, recently, several works have been
addressing the limitations of deep learning by combining it with ideas from collective intelligence, from
applying cellular automata to neural network-based image processing models 47,60 to re-defining how
problems in reinforcement learning can be approached using self-organizing agents 32,52,84 . As we witness
the continuous technological advances in parallel-computation hardware (which is naturally suited to
simulate collective behavior, see Figure 1 for an example), we can expect more works that incorporate
collective intelligence into problems that have been traditionally approached with deep learning.
The goal of this review is to provide a high level survey of how ideas, tools, and insights central
to the field of collective intelligence, most notably self-organization, emergence, and swarm models
have impacted different areas of deep learning, ranging from image processing, reinforcement learning
to meta-learning. We hope this review will provide some insights on future deep learning collective
intelligence synergies, which we believe will lead to meaningful breakthroughs in both fields.
Background: Collective Intelligence

Collective intelligence (CI) is a term widely used in areas like sociology, business, communication and
computer science. The definition of CI can be summarized as a form of distributed intelligence that is
constantly enhancing and coordinating, with the goal of achieving better results than any individual of
the group, through mutual recognition and enrichment of the individual 41,42 . The better results from CI
are attributed to three factors: diversity, independence and decentralization 82,87 .
For our purposes, we view collective intelligence, as a field, to be the study of the group intelligence
that emerges from interactions (can be collaborative or competitive) between many individuals. This
group intelligence is a product of emergence, which occurs when the group is observed to have properties
that the individuals that compose of the group do not have on their own, and emerge only when the
individuals of the group interact in a wider whole.
Examples of such systems are abounded in nature where complex global behaviors toward mutual
goals emerge from simple local interactions/collaborations between individuals 15,40,81,89 . In this review,
we confine ourselves to be concerned with the simulation of collective intelligence, rather than the
analysis of CI observed in nature and society. Decades of earlier work have also explored the simulation
of collective behavior and to gather insights from such simulations. Mataric 45 investigated the use of
physical mobile robots for studying social interactions leading to group behavior. They proposed a
set of basic interactions (e.g., collision avoidance, following, flocking, etc) with the hope that these
primitives would enable a group of autonomous agents to accomplish a common goal or to learn from
each other. Inspired by group behaviors observed in real ant colonies, Dorigo et al. 17 posed stigmergy
Ha and Tang 5
(a particular form of indirect communication used by social insects) as a distributed communication

paradigm and showed how it inspired novel algorithms for solutions of distributed optimization and
control problems. Moreover, Schweitzer and Farmer 72 applied Brownian agent models in many different
contexts. Combined with multi-agent systems and statistical approaches, the authors laid out a vision for
a coherent framework for understanding complex systems.
While some of these earlier works led to the discovery of algorithms that are applicable to optimization
problems (such as ant colony optimization for tackling the traveling salesman problem), many of
these works aim to use these simulation models to understand the emergent phenomenon of collective
intelligence. This points to a fundamental difference between the goals of collective intelligence and
artificial intelligence fields. In collective intelligence, the goal is to build models of complex systems that
can help us explain and understand emergent phenomena, which may have applications to understand
real systems in nature and society. Artificial intelligence (in particular, the field of machine learning), on
the other hand, is concerned with optimization, classification, prediction, and solving a problem.
The early works we mentioned did not fully leverage the modeling power of DL or the advancement
of hardware development, but nonetheless are consistently demonstrating the incredible effects of
CI. Namely, the systems are self-organizing, capable of optimization via swarm intelligence, present
emergent behavior, etc. They suggest that concepts from CI are promising ideas that can be applied to
DL to produce solutions that are robust, adaptable, and have less rigid assumptions about the environment
configuration, which is the focus of this review.
Historical Background: Cellular Neural Networks

Ideas from complex systems such self-organization that were used to model and understand emergent and
collective behavior have a long and interesting historical relationship with the development of artificial
neural networks. While connectionism and artificial neural networks came about in the 1950s with
the birth of artificial intelligence as a research field, our story begins in the 1970s, when a group of
electrical engineers led by pioneer Leon Chua, started developing nonlinear circuits theory and applied
it to computation. He is known for conceptualizing the Memristor in the 1970s (a device that has been
implemented only recently), and devising the Chua circuit, one of the first circuits to exhibit chaotic
behavior. In the 1980s, his group developed Cellular Neural Networks, which are computational systems
that resemble cellular automata (CA), but use neural networks in place of the algorithmic cells typically
seen in CA systems such as Conway’s Game of Life 13 or elementary cellular automata rules 93 .
Figure 4. Left: Typical configuration of a 2D Cellular Neural Network 43 . Right: Google trends for the terms
Deep Learning and Cellular Neural Network over time.
Cellular Neural Networks (CeNNs) 11,12 are artificial neural networks where each neuron, or cell,
can only interact with their immediate neighbors. In the most basic setting, the state of each cell is
continuously updated using a nonlinear function of the states of its neighbors and itself. Unlike modern
deep learning approaches which rely on digital, discrete-time computation, CeNNs are continuous-time
systems that are usually implemented with non-linear analog electronic components (See Figure 4, left),
making them very fast. The dynamics of CeNNs rely on independent local processing of information and
interaction between processing units, and like CAs, they also exhibit emergent behavior, and can be made
to be Universal Turing Machines. However, they are vastly more general than discrete CAs and digital
computers. Due to the continuous state space, CeNNs exhibit emergent behavior never seen before. 21
From the 1990s to mid 2000s, CeNNs became an entire subfield of AI research. Due to its powerful
and efficient distributed computation, it found applications in image processing, texture analysis, and
its inherent analog computation applied to solving PDEs and even modeling biological systems and
organs. 10 There were thousands of peer-reviewed papers, textbooks, and an IEEE conference devoted to
CeNNs, with many proposals to scale them up, stack them, combining them with digital circuits, and
investigating different methods of training them (just like what we are currently seeing in deep learning).
At least two hardware startups were formed to produce CeNN hardware and devices.
But in the latter half of the decade in the 2000s, they suddenly disappeared from the scene! There is
hardly any mention of Cellular Neural Networks in the AI community after 2006. And from the 2010s,
GPUs took over as the predominant platform for neural network research, which led to the rebranding of
artificial neural networks to deep learning. See Figure 4 (right) for a comparison of the trends over time.
No one can really pinpoint the exact reason for the demise of Cellular Neural Networks in AI research.
Like the Memristor, perhaps CeNNs were ahead of its time. Or perhaps the eventual rise of consumer
GPUs made it a compelling platform for deep learning. One can only imagine in a parallel universe where
CeNN’s analog computer chips had won the Hardware Lottery 30 , the state of AI might be very different
where the world and all of our devices are embedded with powerful distributed analog cellular automata.
However, one key difference between CeNNs and deep learning is accessibility, and in our opinion,
this is the main reason it did not catch on. In the current deep learning paradigm, there is an entire
ecosystem of tools designed to make it easy to train and deploy neural network models. It is also relatively
straightforward to train the parameters of a neural network with deep learning frameworks by providing it
with a dataset 9 , or a simulated task environment 29 . Deep learning tools are designed to be used by anyone
with a basic programming background. CeNNs, on the other hand, were designed for electrical engineers
at a time when most EE students knew more about analog circuits than programming languages.
To illustrate this difficulty, “training” a CeNN requires solving a system of at least nine ODEs
to determine the coefficients that govern the analog circuits to define the behavior of the system! In
practice, many practitioners needed to rely on a cookbook 10 of known solutions to problems and then
manually adjust the solutions for new problems. Eventually, genetic algorithms (and early versions of
backpropagation) have been proposed to train CeNNs 38 , but they require simulation software to train and
test the circuits, before deploying on an actual (and highly customized) CeNN hardware.
There are likely more lessons to be learned from Cellular Neural Networks. They were an immensely
powerful hybrid of analog and digital computation that truly synthesized cellular automata with neural
networks. Unfortunately, we probably only witnessed the very beginning of its full potential, before its
demise. Ultimately, commodity GPUs and software tools that abstracted neural networks into simple
Python code enabled deep learning to take over. Although CeNNs have faded away, concepts and ideas
Ha and Tang 7
from complex systems, like CAs, self-organization and emergent behavior have not. Despite being limited
to digital hardware, we are witnessing a resurgence of Collective Intelligence concepts in many areas of
deep learning, from image generation, deep reinforcement learning, to collective and distributed learning
algorithms. As we will see, these concepts are advancing the state of deep learning research by providing
solutions to some limitations and restrictions of traditional artificial neural networks.
Figure 5. A Neural Cellular Automata trained to recognize MNIST digits created by Randazzo et al. 60 is also
available as an interactive web demo. Each cell is only allowed to see the contents of a single pixel and
communicate with its neighbors. Over time, a consensus will be formed as to which digit is the most likely pixel,
but interestingly, disagreements may result depending on the location of the pixel where the prediction is made.
Collective Intelligence for Deep Learning

Collective intelligence naturally arises from the interaction of multiple individuals in a network, and
it is no surprise to also see self-organizing behaviors naturally emerge from artificial neural networks.
This is especially true when we employ repeated computation of identical modules with identical weight
parameters across the network. For example, Gilpin 20 observed the close connection between cellular
automata and convolutional neural networks (CNNs), a type of neural network often used in image
processing that applies the same weights (or filters) to all of its inputs. In fact, they show that any CA
can be represented with a certain kind of CNN, and with an elegant demonstration of Conway’s Game
of Life 13 in a CNN, illustrating that in certain settings, CNNs can exhibit interesting self-organizing
behaviors. Recently, several works such as Mordintsev et al. 47 that we will discuss later have exploited
the self-organizing properties of CNN, and have developed neural network-based cellular automata for
applications such as image regeneration.
Other types of neural network architectures, such as Graph Neural Networks 14,64,94 explicitly target
self-organizing as a central feature, modeling the behavior of each node of a graph as identical neural
network modules that pass messages to their neighbors defined by the edges of a graph. GNNs have been
traditionally used to analyze graph domains such as social networks and molecular structures. Recent
work 22 has also demonstrated the ability of GNNs to learn rules for established CA systems such as
Voronoi diagrams, or the flocking behavior of swarms 71 . As we will discuss later, the self-organizing
properties of GNNs have recently been applied to the deep reinforcement learning domain, creating
agents with far superior generalization capabilities.
We have identified four areas of deep learning that have started to incorporate ideas related to collective
intelligence: (1) Image Processing, (2) Deep Reinforcement Learning, (3) Multi-agent Learning, and (4)
Meta-Learning. We will discuss each area in detail and provide examples in this section.
Image Processing
Implicit relationships and recurring patterns in nature (such as texture and scenery) can benefit from
employing approaches from cellular automata in learning alternative representations of natural images.
Like CeNNs, the Neural Cellular Automata (neural CA) model proposed by Mordvintsev et al. 47 treated
each individual pixel of an image as a single neural network cell. The networks are trained to predict
its color based on the states of its immediate neighbors, thereby developing a model of morphogenesis
for image generation. They demonstrated that it was possible to train neural networks to reconstruct
entire images this way, even when each cell lacks information about its location and rely only on local
information from its neighbors. This approach enabled the generation algorithm to be resistant to noise,
and moreover, allowed images to regenerate when damaged. An extension of neural CA 60 enabled
individual cell to perform image classification tasks, such as handwritten digit classification (MNIST)
by only examining the contents of a single pixel, and passing a message on to the cell’s immediate
neighbors (See Figure 5). Over time, a consensus will be formed as to which digit is the most likely
pixel, but interestingly, disagreements may result depending on the location of the pixel, especially if the
image is intentionally drawn to represent different digits.
Figure 6. Neural CAs have also been applied to the regeneration of Minecraft entities. Sudhakaran et al. 80 ’s
formulation enabled the regeneration of not only Minecraft buildings, trees, but also simple functional machines
in the game such as worm-like creatures that can even regenerate into two distinct creatures when cut in half.
The regeneration with neural CA has been explored beyond 2D images. In a later work, Zhang et al. 95
employed a similar approach to 3D voxel generation. This is particularly useful for high-resolution 3D
scanning, where 3D shape data is often described with sparse and incomplete points. Using generative
cellular automata they can recover full 3D shapes from only a partial set of points. This approach is also
applicable outside of pure generative domains, and can also be applied to the construction of artificial
agents in active environments such as Minecraft. Sudhakaran et al. 80 trained neural CAs to grow complex
entities from Minecraft such as castles, apartment blocks, and trees, some of which are composed of
Ha and Tang 9
thousands of blocks. Aside from regeneration, their system is able to regrow parts of simple functional
machines (such as a virtual creature in the game), and they demonstrate a morphogenetic creature grow
into two distinct creatures when cut in half in the virtual world (See Figure 6).
Cellular Automata is also naturally applicable to provide visual interpretation for images. Qin at al. 55
examined the use of a Hierarchical CA model for visual saliency, to identify items in an image that
stand out. By getting a CA to operate on visual features extracted from a deep neural network, they were
able to iteratively construct multi-scale saliency maps of the image, with the final image being close to
the target items. Sandler et al. 66 later investigated the use of CA for the task of image segmentation,
an area where deep learning enjoys tremendous success. They demonstrated the viability of performing
complex segmentation tasks using CAs with relatively simple rules (with as little as 10K neural network
parameters), with the advantage of the approach being able to scale up to incredibly large image sizes, a
challenge for traditional deep learning models with millions or even billions of model parameters, which
are bounded by GPU memory.
Deep Reinforcement Learning

The rise in deep learning gave birth to the use of deep neural networks for reinforcement learning, or
deep reinforcement learning (Deep RL), equipping reinforcement learning agents with modern neural
networks architectures that can address more complex problems, such as high dimensional continuous
control or vision-based tasks from pixel observations. While Deep RL shares successful characteristics
with deep learning, in that employing sufficient computation resources will generally lead to the solution
of a target training task to be found. But like deep learning, Deep RL has its share of limitations. Agents
trained to perform a particular task often fail when the task is slightly altered. Furthermore, neural
network solutions generally only work for a specific morphology with well-defined input and output
mappings. For instance, a locomotion policy trained for a 4-legged ant might not work for a 6-legged
one, and a controller that expects to receive 10 inputs won’t work if you give it 5, or 20 inputs.
Figure 7. Examples of soft-bodied robot simulation in 2D and 3D. Each cell represents an individual neural
network with local sensory functions that produce local actions, including communicating to neighboring cells.
Training these systems to perform various locomotion tasks involve not only training the neural networks, but
also the design and placement of the soft cells that form the agent’s morphology. Figure from Horibe et al. 31
The evolutionary computation community started approaching some of these challenges earlier on,
by incorporating modularity 67,68 in the evolutionary process that govern the design of artificial agents.
Having agents that are composed of identical but independent modules foster self-organization via local
interactions between the modules, enabling systems that are robust to changes in the agent’s morphology,
an essential requirement in evolutionary systems. These ideas have been presented in the literature of
work on soft-bodied robotics 8 , where robots consist of a grid of voxel cells–each controlled by an
independent neural network with local sensory function that can produce a localized action. Through
message passing, the group of cells that make up the robot are able to self-organize and perform a range
of locomotion tasks (See Figure 7). Later work 36 even proposes incorporating metamorphosis in the
evolution of the placement of the cells to produce configurations robust to a range of environments.
!!
!! !! $→)($) )($)→$
!! !! !!
!!
Shared Bottom-Up Top-Down
π! Module Module
!!
!!
"! !!
!!"→$ "∈&($) !!$→"
!! "∈&($)
!!
Joint Training Shared Modular Policies Emergent Centralized Controllers

of Several Agents via Message Passing
Figure 8. Traditional RL methods train a specific policy for a particular robot with a fixed morphology. But
recent work, like the one shown here by Huang et al. 32 attempts to train a single modular neural network
responsible for controlling a single part of a robot. The global policy of each robot is thus the result of
coordination of these identical modular neural networks. They show that such a system can generalize across
a variety of different skeletal structures, from hoppers to quadrupeds, and even to some unseen morphologies.
Recently, soft-bodied robots have even been combined with the neural CA approach discussed earlier
to enable these robots to regenerate themselves. 31 To bridge the gap between policy optimization (where
the goal is to find the best parameters of the policy neural network) usually done in the Deep RL
community and the type of morphology-policy co-evolution (where both the morphology and the policy
neural network is optimized together) work done in the soft-bodied literature, Bhatia et al. 5 has recently
developed an OpenAI Gym-like 6 environment called Evolution Gym, a benchmark for developing and
comparing algorithms for co-optimizing design and control, which provided an efficient soft-bodied robot
simulator written in C++ with a Python interface.
Modular, decentralized self-organizing controllers have also started to be explored in the Deep RL
community. Wang et al. 91 and Huang et al. 32 explored the use of modular neural networks to control
each individual actuator of a simulated robot for continuous control. They expressed a global locomotion
policy as a collection of modular neural networks (in the case of Huang et al. 32 , identical networks) that
correspond to each of the agent’s actuators, and trained the system using RL. Like soft-bodied robots,
every module is only responsible for controlling its corresponding actuator and receives information from
only its local sensors (See Figure 8). Messages are passed between neighboring modules, propagating
information between distant modules. They show that a single modular policy can generate locomotion
behaviors for several distinct robot morphologies, and show that the policies generalize to variations of
Ha and Tang 11
the morphologies not seen during training, such as creatures with extra legs. As in the case of soft-bodied
robots, these results also demonstrate the emergence of centralized coordination via message passing
between decentralized modules that are collectively optimizing for a shared reward.
Standing Up
+ Y axis
More Limbs Fewer Limbs Wind Water
+ X axis
Locomotion
More Limbs Fewer Limbs Bi-Modal Bumps Water
Hurdles Gaps Stairs Valley

Training Zero-Shot Generalization
Figure 9. Self-organization also enables systems in RL environments to self-configure its own design for a
given task. Pathak et al. 52 explored such dynamic and modular agents and showed that they can generalize to
not only unseen environments, but also to unseen morphologies composed of additional modules.
The aforementioned work hints at the power of embodied cognition, which emphasizes the role of the
agent’s body in generating behavior. Although the focus of much of the work in Deep RL is in learning
neural network policies for an agent with a fixed design (e.g., a bipedal robot, humanoid, or robot arm),
embodied intelligence is an area that is gathering interest in the sub-field 23,52 . Inspired by previous work
on self-configuring modular robots 26,62,77 , Pathak et al. 52 investigates a collection of primitive agents that
learn to self-assemble into a complex body while also learning a local policy to control the body without
an explicit centralized control unit. Each primitive agent (which consists of a limb and a motor) can
link up with nearby agents, allowing for complex morphologies to emerge. Their results show that these
dynamic and modular agents are robust to changes in conditions and the policies can generalize to not
only unseen environments, but also to unseen morphologies consisting of a greater number of modules.
We note that these ideas can be used to allow general DL systems (not confined to RL) to have more
flexible architectures that can even learn machine learning algorithms, and we will discuss this later on
in the Meta-Learning section.
Aside from adapting to changing morphologies and environments, self-organizing systems can also
adapt to changes in their sensory inputs. Sensory substitution refers to the brain’s ability to use one
sensory modality (e.g., touch) to supply environmental information normally gathered by another sense
(e.g., vision). However, most neural networks are not able to adapt to sensory substitutions. For instance,
most RL agents require their inputs to be in an exact, pre-specified rigid format, otherwise they will fail.
Figure 10. Using the properties of self-organization and attention, Tang and Ha 84 investigated RL agents that
treat their observations as an arbitrarily ordered, variable-length list of sensory inputs. They partition the input
in visual tasks such as CarRacing and Atari Pong 6,85 into a 2D grid of small patches, and shuffled their ordering
(Left). They also added many additional redundant noisy input channels in continuous control tasks 19 in a
shuffled order (Right), where the agent has to learn to identify which inputs are useful. Each sensory neuron in
the system receives a stream of a particular input, and through coordination, must complete the task at hand.
In a recent work, Tang and Ha 84 explored permutation invariant neural network agents that require each of
their sensory neurons (receptors that receive sensory inputs from the environment) to deduce the meaning
and context of its input signal, rather than explicitly assume a fixed meaning. They demonstrate that these
sensory networks can be trained to integrate information received locally, and through communication
between them using an attention mechanism, can collectively produce a globally coherent policy.
Moreover, the system can still perform its task even if the ordering of its sensory inputs (represented
as real-valued numbers) is randomly permuted several times during an episode. Their experiments show
that such agents are robust to observations that contain many additional redundant or noisy information,
or observations that are corrupt and incomplete.
Multi-agent Learning
Collective intelligence can be viewed at several different scales. The brain can be viewed as a network of
individual neurons functioning collectively. Each organ can be viewed as a collection of cells performing
a collective function. Individual animals can be viewed as a collection of organs working together. As
we zoom out further, we can also look at human intelligence beyond biology and see human civilization
as a collective intelligence solving (and producing) problems that are beyond the capabilities of a single
person. As such, while in the previous section, we discussed several works that leverage the power of
Ha and Tang 13
collective intelligence to essentially decompose a single RL agent into a collection of smaller RL agents
working together towards a collective goal, resembling a model of collective intelligence at the biological
level, we can also view multi-agent problems as a model of collective intelligence at the societal level.
A major focus of the collective intelligence field is to study the group intelligence and behaviors
emerged from a large collection of individuals, whether in humans citetapscott2008wikinomics,
animals 81 insects 17,73 , or artificial swarm robots 26,62 . This focus has clearly been missing in the Deep RL
field. While multi-agent reinforcement learning (MARL) is a well-established branch of Deep RL, most
learning algorithms and environments proposed have targeted a relatively small number of agents 18,49 ,
and thus not sufficient to study the emergent properties from large populations. In the most common
MARL environments 3,34,61,88 , “multi-agent” simply means 2 or 4 agent trained to perform a task by
means of self-play 4,24,44 . Collective intelligence observed in nature or in society, however, relies on a
much larger number of individuals than typically studied in MARL, involving population sizes from
thousands to million. In this section, we will discuss recent works from the MARL sub-field of Deep RL
that had been inspired by collective intelligence (as their authors have even noted in their publications).
Unlike most MARL works, these work started to employ large population of agents (each enabled by a
neural network), from thousands to millions, in order to truly study their emergent properties at the macro
level (1000+ agents), rather than at the micro-level (2-4 agents).
Figure 11. MAgent 96 is a set of environments where large numbers of pixel agents in a gridworld interact in
battles or other competitive scenarios. Unlike most platforms that focus on RL research with a single agent or
only few agents, their aim is to support RL research that scales up to millions of agents. The environments in
this platform are now maintained as part of the PettingZoo 88 open-source library for multi-agent RL research.
Recent advances in Deep RL have demonstrated the capabilities of simulating thousands of agents in
complex 3D simulation environments using only a single GPU 28,63 . A key challenge is in approaching the
problem of multi-agent learning at a much larger scale, leveraging such advances in parallel computing
hardware and distributed computation, with the goal of training millions of agents. In this section, we
will example recent attempts at training a massive number of agents that interact in a collective setting.
Rather than focusing on realistic physics or environment realism, Zheng. et al. 96 developed a platform
called MAgent, a simple grid-world environment that can allows millions of neural network agents. Their
focus is on scalability, and they demonstrate that MAgent can host up to a million agents on a single
GPU (in 2017). Their platform supports interactions among the population of agents, and facilitates not
only the study of learning algorithms for policy optimization, but more critically, enables the study of
social phenomena emerging from the millions of agents in an AI society, including the emergence of
languages and societal hierarchy structures that may have emerged. Environments can be built using
scripting, and they have provided examples such as predator-prey simulations, battlefields, adversarial
pursuit, supporting different species of distinct agents that may exhibit different behaviors.
MAgent inspired many recent applications, including multi-agent driving 53 , which looks at emergent
behavior of entire populations of driving agents to optimize the driving policies which not only affect a
single car, but aim to improve the safety of the population as a whole. These directions are good examples
that demonstrate the difference between problems framed for deep learning (finding a driving policy for a
single car) versus problems in collective intelligence (finding a driving policy for the entire population).
Figure 12. Neural MMO 79 is a platform that simulates populations of agents in procedurally generated virtual
worlds to support multi-agent research while keeping its requirements computationally accessible. Users
select from a set of provided game systems to create environments for their specific research problems—with
support for up to a thousand agents and one square kilometer maps over several thousand time steps. The
project is under active development, with extensive documentation and tools that provide logging, and
visualization tools for researchers. As of writing, this platform is to be demoed at the NeurIPS 2021 conference.
Inspired by the game genre of MMORPGs (Massively Multiplayer Online Role-Playing Games,
aka MMOs), Neural MMO 79 is an AI research environment that supports a large number of artificial
agents that have to compete for finite resources in order to survive. As such, their environment enables
large-scale simulation of multi-agent interactions that requires agents to learn combat and navigation
policies alongside other agents in a large population all attempting to do the same. Unlike most MARL
environments, each agent is allowed to have their own distinct set of neural network weights, which has
been a technical challenge in terms of memory consumption. Preliminary experimental results in early
versions of the platform 78 demonstrated agents with distinct neural network weight parameters developed
skills to fill different niches in order to avoid competition within a large population of agents.
Ha and Tang 15
As of writing, this project is in active development in the NeurIPS machine learning community 79 to
work towards studies of large agent populations, long time horizons, open-ended tasks, and modular game
systems. The developers provide active support and documentation, and also develop additional training,
logging, and visualization tools to enable this line of large-scale multi-agent research. This work is still
in its early stages, and only time will tell if platforms that enable the study of large populations such as
Neural MMO or MAgent gain further traction within the Deep RL communities.
Meta-Learning
In the previous sections, we described works that express the solution to problems in terms of a collection
of independent neural network agents acting together to achieve a common goal. These parameters
of these neural network models are optimized for the collective performance of the population. While
these systems have been shown to be robust and adapt to changes in its environment, they are ultimately
hardwired to perform a certain task, and cannot perform another task unless retrained from scratch.
Meta-learning is an active area of research within deep learning where the goal is to train the system
to learn. It is a large sub-field of ML, including areas such as simple transfer learning from one training
set to another. For our purposes, we follow the line of work from Schmidhuber 70 , where he views meta
learning as the problem of ML algorithms that can learn better ML algorithms, which he believes is
required to build truly self-improving AI systems.
So unlike traditionally training a neural network to perform one task, where the weight parameters
of neural networks are traditionally optimized with a gradient descent algorithm, or with evolution
strategies 86 , the goal of meta-learning is to train a meta-learner (which can be another neural network-
based system) to learn a learning algorithm. This is a particularly challenging task, with a long history
see Schmidhuber 70 for a review). In this section, we will highlight recent promising works that make use
of collective agents that can learn to learn, rather than learn to perform only a particular task (which we
have covered in the previous section).
Concepts from self-organization can be naturally applied to train neural networks to meta-learn by
extending the basic building blocks that compose artificial neural networks. As we know, artificial
neural networks consist of identical neurons which are modeled as non-linear activation functions. These
neurons are connected in a network by synaposes which are weight parameters which are normally trained
with a learning algorithm such as gradient descent. But one can imagine extending the abstraction of
neurons and synapses beyond static activation functions and floating point parameters. Indeed, recent
work 48,50 have explored modeling each neuron of a neural network as an individual reinforcement
learning agent. Using the terminology of RL, each neuron’s observations are its current state which
change as information is transmitted through the network, and each neuron’s actions enable each neuron
to modify its connections with other neurons in the system, hence the problem of learning to learn is
treated as a multi-agent RL problem where each agent is part of the collection of neurons in a neural
network. While this approach is elegant, the aforementioned works are only capable of learning to solve
toy problems and are not yet competitive with existing learning algorithms.
Recent methods have gone beyond using simple scalar weights to transmit scalar signals between
neurons. Sandler et al. 65 introduce a new type of generalized artificial neural network where both neurons
and synapses have multiple states. Traditional artificial neural networks can be viewed as a special case
of their framework with two-states where one is used for activations, the other is used for gradients
produced using the backpropagation learning rule. In the general framework, they do not require the
backpropagation procedure to compute any gradients, and instead rely on a shared local learning rule
for updating the states of the synapses and neurons. This Hebbian-style bi-directional local update rule
would only require that each synapse and neuron only requires state information from their neighboring
synapse and neurons, similar to cellular automata. The rule is parameterized as a low-dimensional genome
vector, and is consistent across the system. They employed both evolution strategies, or conventional
optimization techniques to meta-learn this genome vector, and their main result is that the update rules
meta-learned on the training tasks generalize to unseen novel test tasks. Furthermore, the update rules
perform faster than gradient-descent based learning algorithms for several standard classification tasks.
Activation Activation (t+1)

Activation
(t) s11
s11
m1 m1 m1
RNN RNN
(t) (t+1)
s12 s12
RNN RNN
(t)
s21 C (t+1)
s21 N=4
RNN RNN
(t) (t+1)
s22 s22
m2 m2 m2
RNN RNN
Activation Activation Activation
Figure 13. Recent work by Sandler et al. 65 and Kirsch et al. 37 attempt to generalize the accepted notion of
artificial neural networks, where each neuron can hold multiple states rather than a scalar value, and each
synapse function bi-directionally to facilitate both learning and inference. In this figure, Kirsch et al. 37 use an
identical recurrent neural network (RNN) (with different internal hidden states) to model each synapse, and
show that the network can be trained by simply running the RNNs forward, without using backpropagation.
A similar direction has been taken by Kirsch et al. 37 , where the neurons and synapses of a neural
network are also generalized to higher dimension message-passing systems, but in their case each synapse
is replaced by an recurrent neural network (RNN) with the same shared parameters. These RNN synapses
are bi-directional and govern the flow of information across the network. Like Sandler et al. 65 , the bi-
directional property allows for the network to be used for both inference and learning at the same time
by running the system in forward-pass mode. The weights of this system are essentially stored in the
hidden states of the RNNs so by simply running the system, they can train themselves using the error
signals as feedback. Since RNNs are general-purpose computers, they were able to demonstrate that
the system can encode the gradient-based backpropagation algorithm by training the system to simply
emulate backpropagation, rather than explicitly calculating gradients via hand-engineering. Of course,
their system is much more general than backpropagation, and thus capable of learning new learning
algorithms that are much more efficient than backpropagation (See Figure 13).
The previous two works mentioned in this section are only recently published at the time of writing,
and we believe that these decentralized local meta-learning approaches have the potential to revolutionize
Ha and Tang 17
the way neural networks are used in the future in a way that challenges the current paradigm that separates
model training and model deployment. There is still much work to be done in demonstrating that these
approaches can scale to larger datasets, due to inherently much larger memory requirements (due to
much larger internal states of the system). Furthermore, while the algorithms are able to produce learning
algorithms that are vastly more sample efficient compared to gradient descent, this efficiency is only
apparent in the early stages of learning, and performance tends to peak very early on. Gradient descent,
while less efficient, is less biased towards few-shot learning, and can continue to run for many more cycles
to refine the weight parameters that will ultimately produce networks that achieve higher performance.
Discussion
In this survey, we first gave a brief historical background to describe the intertwined development of
deep learning and collective intelligence research. The two research areas were born at roughly the
same time, and we can also spot some positive correlations of the rises and falls between the two areas
throughout their history. This is no coincidence, since advances and breakthroughs in one of the two
areas can usually innovate new ideas or complement the solutions to the problems in the other. For
example, introducing deep neural networks and related training algorithms to cellular automata allowed
us to develop image generation algorithms that are resistant to noise and have “self-healing” properties.
This survey explored several works in deep learning that were also inspired by concepts in collective
intelligence. At a macro-level collective intelligence in multi-agent deep RL led to interesting works
that can exceed human performance through collective self-play, and to decentralized self-organizing
robot controllers; At a micro-level, collective intelligence is also embedded inside advanced methods of
simulating each neuron, synapse or other object at a finer granularity within a system with deep models.
Despite the progress made in the works described in this survey, many challenges lie ahead. While
neural CA techniques have been applied to image-processing, their application has so far been limited
to relatively small and simple datasets, and their image generation quality is still far below the state-
of-the-art on more sophisticated datasets such as ImageNet or Celebrity Faces 51 . For Deep RL, while
the surveyed works have demonstrated that a global policy can be replaced by a collection of smaller
individual policies, we have yet to transfer these experiments to real physical robots. Finally, we
have witnessed self-organization guide meta-learning algorithms. While this line of work is extremely
promising, they are currently confined to small-scale experiments due to the large computational
requirements that come with replacing every single neural connection with an entire artificial neural
network. We believe many challenges will be solved in due time as their trajectories are already in motion.
Looking at their respective development trajectories, DL has accomplished notable achievements
in developing novel architectures and training algorithms that led to efficient learning and better
performance. The research and development cycle of DL is more engineering-focused, as such the
advances seen are more benchmark-based (such as classification accuracy for image recognition
problems, or related quantitative metrics for language modeling and machine translation problems). DL
advances are generally more incremental and predictable in nature, while CI focuses more on problem
formulations and environmental mechanisms that motivate novel emergent group behavior. As we have
shown in this survey, CI-based techniques enable new capabilities that were simply not possible before.
For instance, it is impossible to incrementally-improve a fixed-robot to become a robot capable of self-
assembly, and gain all the benefits from such modularity. Naturally, the two areas can complement each
other. We are confident that the hand-in-hand style of co-development will continue.
Glossary of Terms and Definitions
Deep Learning Related
Term Definition
The study of machine learning methods based on artificial neural
networks. Much of the field is devoted to research on the numerous
Deep Learning (field)
architectures, their training methods, theoretical properties, and
applications of artificial neural networks.
An approach of learning when both the data and the expected
Supervised Learning
outputs (training signal) are given.
Unsupervised An approach of learning to represent data in a latent space (the
Representation dimension of which is usually, but not necessarily, lower than that
Learning of the input data) without additional training signals.
A research problem in ML that focuses on applying knowledge gained
Transfer Learning
from one problem to solve another different but related problem.
A large sub-field of ML, and in this paper (and including areas such
as simple transfer learning from one training set to another). For our
purposes, we view meta-learning as the problem of machine learning
Meta-Learning
algorithms that can learn better machine learning algorithms, which
many believe is required to build truly self-improving artificial
intelligent systems.
RL is an area of ML. It consists of methods that train an agent to
Reinforcement
improve its policy from interactions with the environment or
Learning
experiences in order to achieve goals.
An (artificial) agent or a controller is a system that takes actions
Agent / Controller
corresponding to a series of inputs in order to achieve goals.
A (control) policy is the “guide book” by which an agent makes
decisions for its actions. In deep RL, a policy usually takes the
Policy
form of an artificial neural network which accepts the inputs from
the task/environment and outputs the corresponding actions.
A training scheme in RL where an agent is trained by playing
Self-Play
against/with snapshots of itself.
A class of artificial neural networks that are commonly applied to
Convolutional
imagery data. Their connectivity pattern resembles the organization
Neural Networks
of the animal visual cortex.
A class of artificial neural networks most commonly applied to analyze
Recurrent
sequential/temporal data. RNNs can use their internal states to process
Neural Networks
inputs of variable lengths.
A class of artificial neural networks for processing data best represented
Graph
by graph data structures. Such data examples include social networks,
Neural Networks
molecule structures, robot morphologies, etc.
A GPU is a specialized electronic circuit designed to rapidly accelerate
Graph
the creation of images. Their highly parallel structure makes them
Processing
efficient for algorithms that process large blocks of data parallelly and
Unit
are therefore widely adopted in DL research.
MNIST is a dataset of handwritten digits that is commonly used for
MNIST
training image processing systems.
Ha and Tang 19
Collective Intelligence Related Concepts
Term Definition
The study of the shared, or group intelligence that emerges from the
Collective
interaction (collaboration, collective efforts, and/or competition) of
Intelligence (field)
a large group of individuals.
A process where some form of overall order arises from (local)
Self-Organization
interactions between parts within a system.
Other Concepts
Term Definition
Systems whose behavior is intrinsically difficult to model due to the
Complex Systems dependencies and interactions between the parts within the system
and/or across time.
A CA is a collection of cells on a grid that evolves their states over a
Cellular Automaton set of discrete values according to predefined rules based on the states
of the neighboring cells.
It is a theory stating that cognition is shaped by aspects of the entire
body of the organism. It emphasizes the role of the body (e.g., motor,
Embodied Cognition
perception) in forming cognition features (e.g., form concepts, make
judgements).
References
1. Alam M, Samad MD, Vidyaratne L, Glandon A and Iftekharuddin KM (2020) Survey on deep neural networks
in speech and vision systems. Neurocomputing 417: 302–321.
2. Authors W (2022) Trajan’s bridge at alcantara. Wikipedia .
3. Baker B, Kanitscheider I, Markov T, Wu Y, Powell G, McGrew B and Mordatch I (2019) Emergent tool use
from multi-agent autocurricula. arXiv preprint arXiv:1909.07528 .
4. Bansal T, Pachocki J, Sidor S, Sutskever I and Mordatch I (2017) Emergent complexity via multi-agent
competition. arXiv preprint arXiv:1710.03748 .
5. Bhatia J, Jackson H, Tian Y, Xu J and Matusik W (2021) Evolution gym: A large-scale benchmark for
evolving soft robots. In: Advances in Neural Information Processing Systems. Curran Associates, Inc. URL
https://sites.google.com/corp/view/evolution-gym-benchmark/.
6. Brockman G, Cheung V, Pettersson L, Schneider J, Schulman J, Tang J and Zaremba W (2016) Openai gym.
arXiv preprint arXiv:1606.01540 .
7. Brown TB, Mann B, Ryder N, Subbiah M, Kaplan J, Dhariwal P, Neelakantan A, Shyam P, Sastry G, Askell A
et al. (2020) Language models are few-shot learners. arXiv preprint arXiv:2005.14165 .
8. Cheney N, MacCurdy R, Clune J and Lipson H (2014) Unshackling evolution: evolving soft robots with multiple
materials and a powerful generative encoding. ACM SIGEVOlution 7(1): 11–23.
9. Chollet F et al. (2015) keras.
10. Chua LO and Roska T (2002) Cellular neural networks and visual computing: foundations and applications.
Cambridge university press.
11. Chua LO and Yang L (1988) Cellular neural networks: Applications. IEEE Transactions on circuits and systems
35(10): 1273–1290.
12. Chua LO and Yang L (1988) Cellular neural networks: Theory. IEEE Transactions on circuits and systems
35(10): 1257–1272.
13. Conway J et al. (1970) The game of life. Scientific American 223(4): 4.
14. Daigavane A, Ravindran B and Aggarwal G (2021) Understanding convolutions on graphs. Distill DOI:
10.23915/distill.00032. Https://distill.pub/2021/understanding-gnns.
15. Deneubourg JL and Goss S (1989) Collective patterns and decision-making. Ethology Ecology & Evolution
1(4): 295–311.
16. Deng J, Dong W, Socher R, Li LJ, Li K and Fei-Fei L (2009) Imagenet: A large-scale hierarchical image
database. In: 2009 IEEE conference on computer vision and pattern recognition. Ieee, pp. 248–255.
17. Dorigo M, Bonabeau E and Theraulaz G (2000) Ant algorithms and stigmergy. Future Generation Computer
Systems 16(8): 851–871.
18. Foerster JN, Assael YM, De Freitas N and Whiteson S (2016) Learning to communicate with deep multi-agent
reinforcement learning. arXiv preprint arXiv:1605.06676 .
19. Freeman CD, Metz L and Ha D (2019) Learning to predict without looking ahead: World models without forward
prediction URL https://learningtopredict.github.io.
20. Gilpin W (2019) Cellular automata as convolutional neural networks. Physical Review E 100(3): 032402.
21. GoraS L, Chua LO and Leenaerts D (1995) Turing patterns in cnns. i. once over lightly. IEEE Transactions on
Circuits and Systems I: Fundamental Theory and Applications 42(10): 602–611.
22. Grattarola D, Livi L and Alippi C (2021) Learning graph cellular automata.
23. Ha D (2018) Reinforcement learning for improving agent design URL https://designrl.github.io.
24. Ha D (2020) Slime volleyball gym environment. https://github.com/hardmaru/
slimevolleygym.
25. Ha D and Schmidhuber J (2018) Recurrent world models facilitate policy evolution. In: Advances in Neural
Information Processing Systems 31. Curran Associates, Inc., pp. 2451–2463. URL https://papers.
nips.cc/paper/7512-recurrent-world-models-facilitate-policy-evolution.
https://worldmodels.github.io.
26. Hamann H (2018) Swarm robotics: A formal approach. Springer.
27. He K, Zhang X, Ren S and Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the
IEEE conference on computer vision and pattern recognition. pp. 770–778.
28. Heiden E, Millard D, Coumans E, Sheng Y and Sukhatme GS (2021) NeuralSim: Augmenting dif-
ferentiable simulators with neural networks. In: Proceedings of the IEEE International Confer-
ence on Robotics and Automation (ICRA). URL https://github.com/google-research/
tiny-differentiable-simulator.
29. Hill A, Raffin A, Ernestus M, Gleave A, Kanervisto A, Traore R, Dhariwal P, Hesse C, Klimov O, Nichol A
et al. (2018) Stable baselines.
30. Hooker S (2020) The hardware lottery. arXiv preprint arXiv:2009.06489 URL https://
hardwarelottery.github.io/.
31. Horibe K, Walker K and Risi S (2021) Regenerating soft robots through neural cellular automata. In: EuroGP.
pp. 36–50.
32. Huang W, Mordatch I and Pathak D (2020) One policy to control them all: Shared modular policies for agent-
agnostic control. In: International Conference on Machine Learning. PMLR, pp. 4455–4464.
Ha and Tang 21
33. Jabbar A, Li X and Omar B (2021) A survey on generative adversarial networks: Variants, applications, and
training. ACM Computing Surveys (CSUR) 54(8): 1–49.
34. Jaderberg M, Czarnecki WM, Dunning I, Marris L, Lever G, Castaneda AG, Beattie C, Rabinowitz NC,
Morcos AS, Ruderman A et al. (2019) Human-level performance in 3d multiplayer games with population-
based reinforcement learning. Science 364(6443): 859–865.
35. Jenal M (2011) What ants can teach us about the market. URL https://www.jenal.org/
what-ants-can-teach-us-about-the-market/.
36. Joachimczak M, Suzuki R and Arita T (2016) Artificial metamorphosis: Evolutionary design of transforming,
soft-bodied robots. Artificial life 22(3): 271–298.
37. Kirsch L and Schmidhuber J (2020) Meta learning backpropagation and improving it. arXiv preprint
arXiv:2012.14905 .
38. Kozek T, Roska T and Chua LO (1993) Genetic algorithm for cnn template learning. IEEE Transactions on
Circuits and Systems I: Fundamental Theory and Applications 40(6): 392–402.
39. Krizhevsky A, Sutskever I and Hinton GE (2012) Imagenet classification with deep convolutional neural
networks. Advances in neural information processing systems 25: 1097–1105.
40. Lajad R, Moreno E and Arenas A (2021) Young honeybees show learned preferences after experiencing
adulterated pollen. Scientific reports 11(1): 1–11.
41. Leimeister JM (2010) Collective intelligence. Business & Information Systems Engineering 2(4): 245–248.
42. Lévy P (1997) Collective intelligence.
43. Liu JB, Raza Z and Javaid M (2020) Zagreb connection numbers for cellular neural networks. Discrete Dynamics
in Nature and Society 2020.
44. Liu S, Lever G, Merel J, Tunyasuvunakool S, Heess N and Graepel T (2019) Emergent coordination through
competition. arXiv preprint arXiv:1902.07151 .
45. Mataric MJ (1993) Designing emergent behaviors: From local interactions to collective intelligence. In:
Proceedings of the Second International Conference on Simulation of Adaptive Behavior. pp. 432–441.
46. Mnih V, Kavukcuoglu K, Silver D, Rusu AA, Veness J, Bellemare MG, Graves A, Riedmiller M, Fidjeland
AK, Ostrovski G et al. (2015) Human-level control through deep reinforcement learning. nature 518(7540):
529–533.
47. Mordvintsev A, Randazzo E, Niklasson E and Levin M (2020) Growing neural cellular automata. Distill DOI:
10.23915/distill.00023. URL https://distill.pub/2020/growing-ca.
48. Ohsawa S, Akuzawa K, Matsushima T, Bezerra G, Iwasawa Y, Kajino H, Takenaka S and Matsuo Y (2018)
Neuron as an agent. URL https://openreview.net/forum?id=BkfEzz-0-.
49. OroojlooyJadid A and Hajinezhad D (2019) A review of cooperative multi-agent deep reinforcement learning.
arXiv preprint arXiv:1908.03963 .
50. Ott J (2020) Giving up control: Neurons as reinforcement learning agents. arXiv preprint arXiv:2003.11642 .
51. Palm RB, Duque MG, Sudhakaran S and Risi S (2022) Variational neural cellular automata. In: International
Conference on Learning Representations. URL https://openreview.net/forum?id=7fFO4cMBx_
9.
52. Pathak D, Lu C, Darrell T, Isola P and Efros AA (2019) Learning to control self-assembling morphologies: a
study of generalization via modularity. arXiv preprint arXiv:1902.05546 .
53. Peng Z, Hui KM, Liu C, Zhou B et al. (2021) Learning to simulate self-driven particles system with coordinated
policy optimization. Advances in Neural Information Processing Systems 34.
54. Pickering A (2010) The cybernetic brain. University of Chicago Press.
55. Qin Y, Feng M, Lu H and Cottrell GW (2018) Hierarchical cellular automata for visual saliency. International
Journal of Computer Vision 126(7): 751–770.
56. Qu X, Sun Z, Ong YS, Gupta A and Wei P (2020) Minimalistic attacks: How little it takes to fool deep
reinforcement learning policies. IEEE Transactions on Cognitive and Developmental Systems .
57. Radford A, Kim JW, Hallacy C, Ramesh A, Goh G, Agarwal S, Sastry G, Askell A, Mishkin P, Clark J et al.
(2021) Learning transferable visual models from natural language supervision. arXiv preprint arXiv:2103.00020
.
58. Radford A, Narasimhan K, Salimans T and Sutskever I (2018) Improving language understanding by generative
pre-training .
59. Radford A, Wu J, Child R, Luan D, Amodei D, Sutskever I et al. (2019) Language models are unsupervised
multitask learners. OpenAI blog 1(8): 9.
60. Randazzo E, Mordvintsev A, Niklasson E, Levin M and Greydanus S (2020) Self-classifying mnist digits. Distill
DOI:10.23915/distill.00027.002. URL https://distill.pub/2020/selforg/mnist.
61. Resnick C, Eldridge W, Ha D, Britz D, Foerster J, Togelius J, Cho K and Bruna J (2018) Pommerman: A
multi-agent playground. arXiv preprint arXiv:1809.07124 .
62. Rubenstein M, Cornejo A and Nagpal R (2014) Programmable self-assembly in a thousand-robot swarm.
Science 345(6198): 795–799.
63. Rudin N, Hoeller D, Reist P and Hutter M (2021) Learning to walk in minutes using massively parallel deep
reinforcement learning. arXiv preprint arXiv:2109.11978 .
64. Sanchez-Lengeling B, Reif E, Pearce A and Wiltschko AB (2021) A gentle introduction to graph neural
networks. Distill 6(9): e33.
65. Sandler M, Vladymyrov M, Zhmoginov A, Miller N, Madams T, Jackson A and Arcas BAY (2021) Meta-
learning bidirectional update rules. In: International Conference on Machine Learning. PMLR, pp. 9288–9300.
66. Sandler M, Zhmoginov A, Luo L, Mordvintsev A, Randazzo E et al. (2020) Image segmentation via cellular
automata. arXiv preprint arXiv:2008.04965 .
67. Schilling MA (2000) Toward a general modular systems theory and its application to interfirm product
modularity. Academy of management review 25(2): 312–334.
68. Schilling MA and Steensma HK (2001) The use of modular organizational forms: An industry-level analysis.
Academy of management journal 44(6): 1149–1168.
69. Schmidhuber J (2014) Who invented backpropagation? More[DL2] .
70. Schmidhuber J (2020) Metalearning machines learn to learn (1987-). URL https://people.idsia.ch/
˜juergen/metalearning.html. https://people.idsia.ch/˜juergen/metalearning.
html.
71. Schoenholz S and Cubuk ED (2020) Jax md: a framework for differentiable physics. Advances in Neural
Information Processing Systems 33.
72. Schweitzer F and Farmer JD (2003) Brownian agents and active particles: collective dynamics in the natural
and social sciences, volume 1. Springer.
73. Seeley TD (2010) Honeybee democracy. Princeton University Press.
74. Silver D, Huang A, Maddison CJ, Guez A, Sifre L, Van Den Driessche G, Schrittwieser J, Antonoglou I,
Panneershelvam V, Lanctot M et al. (2016) Mastering the game of go with deep neural networks and tree search.
nature 529(7587): 484–489.
75. Simonyan K and Zisserman A (2014) Very deep convolutional networks for large-scale image recognition. arXiv
preprint arXiv:1409.1556 .
Ha and Tang 23
76. Stahlberg F (2020) Neural machine translation: A review. Journal of Artificial Intelligence Research 69: 343–
418.
77. Stoy K, Brandt D, Christensen DJ and Brandt D (2010) Self-reconfigurable robots: an introduction .
78. Suarez J, Du Y, Isola P and Mordatch I (2019) Neural mmo: A massively multiagent game environment for
training and evaluating intelligent agents. arXiv preprint arXiv:1903.00784 .
79. Suarez J, Du Y, Zhu C, Mordatch I and Isola P (2021) The neural mmo platform for massively multiagent
research. In: Thirty-fifth Conference on Neural Information Processing Systems Datasets and Benchmarks Track.
URL https://openreview.net/forum?id=J0d-I8yFtP.
80. Sudhakaran S, Grbic D, Li S, Katona A, Najarro E, Glanois C and Risi S (2021) Growing 3d artefacts and
functional machines with neural cellular automata. arXiv preprint arXiv:2103.08737 .
81. Sumpter DJ (2010) Collective animal behavior. Princeton University Press.
82. Surowiecki J (2005) The wisdom of crowds. Anchor.
83. Tan X, Qin T, Soong F and Liu TY (2021) A survey on neural speech synthesis. arXiv preprint arXiv:2106.15561
.
84. Tang Y and Ha D (2021) The sensory neuron as a transformer: Permutation-invariant neural networks for
reinforcement learning. In: Thirty-Fifth Conference on Neural Information Processing Systems. URL https:
//openreview.net/forum?id=wtLW-Amuds. https://attentionneuron.github.io.
85. Tang Y, Nguyen D and Ha D (2020) Neuroevolution of self-interpretable agents. In: Proceedings of the Genetic
and Evolutionary Computation Conference. URL https://attentionagent.github.io.
86. Tang Y, Tian Y and Ha D (2022) Evojax: Hardware-accelerated neuroevolution. arXiv preprint
arXiv:2202.05008 .
87. Tapscott D and Williams AD (2008) Wikinomics: How mass collaboration changes everything. Penguin.
88. Terry JK, Black B, Jayakumar M, Hari A, Sullivan R, Santos L, Dieffendahl C, Williams NL, Lokesh Y, Horsch
C et al. (2020) Pettingzoo: Gym for multi-agent reinforcement learning. arXiv preprint arXiv:2009.14471 .
89. Toner J, Tu Y and Ramaswamy S (2005) Hydrodynamics and phases of flocks. Annals of Physics 318(1):
170–244.
90. Vinyals O, Babuschkin I, Czarnecki WM, Mathieu M, Dudzik A, Chung J, Choi DH, Powell R, Ewalds T,
Georgiev P et al. (2019) Grandmaster level in starcraft ii using multi-agent reinforcement learning. Nature
575(7782): 350–354.
91. Wang T, Liao R, Ba J and Fidler S (2018) Nervenet: Learning structured policy with graph neural networks. In:
International Conference on Learning Representations.
92. Wang Z, She Q and Ward TE (2021) Generative adversarial networks in computer vision: A survey and
taxonomy. ACM Computing Surveys (CSUR) 54(2): 1–38.
93. Wolfram S (2002) A new kind of science, volume 5. Wolfram media Champaign, IL.
94. Wu Z, Pan S, Chen F, Long G, Zhang C and Philip SY (2020) A comprehensive survey on graph neural networks.
IEEE transactions on neural networks and learning systems 32(1): 4–24.
95. Zhang D, Choi C, Kim J and Kim YM (2021) Learning to generate 3d shapes with generative cellular automata.
In: International Conference on Learning Representations. URL https://openreview.net/forum?
id=rABUmU3ulQh.
96. Zheng L, Yang J, Cai H, Zhou M, Zhang W, Wang J and Yu Y (2018) Magent: A many-agent reinforcement
learning platform for artificial collective intelligence. In: Proceedings of the AAAI Conference on Artificial
Intelligence, volume 32.

Collective Intelligence Deep Learning

Uploaded by

Copyright:

Available Formats

Collective Intelligence Deep Learning

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Collective Intelligence Deep Learning

Uploaded by

Copyright:

Available Formats

PREPRINT

Collective Intelligence for

Deep Learning: A Survey of

David Ha1 and Yujin Tang1

1 Google Brain, Tokyo, Japan.

Both authors contributed equally to this work.

Background: Collective Intelligence

(a particular form of indirect communication used by social insects) as a distributed communication

Historical Background: Cellular Neural Networks

Collective Intelligence for Deep Learning

Deep Reinforcement Learning

Joint Training Shared Modular Policies Emergent Centralized Controllers

More Limbs Fewer Limbs Wind Water

More Limbs Fewer Limbs Bi-Modal Bumps Water

Hurdles Gaps Stairs Valley

Activation Activation (t+1)

Activation Activation Activation

Glossary of Terms and Definitions

Deep Learning Related

Collective Intelligence Related Concepts

You might also like