Tenenbaum 2011
Tenenbaum 2011
Tenenbaum 2011
If you wish to distribute this article to others, you can order high-quality copies for your
colleagues, clients, or customers by clicking here.
Updated information and services, including high-resolution figures, can be found in the online
version of this article at:
http://www.sciencemag.org/content/331/6022/1279.full.html
Supporting Online Material can be found at:
http://www.sciencemag.org/content/suppl/2011/03/08/331.6022.1279.DC1.html
This article cites 33 articles, 4 of which can be accessed free:
http://www.sciencemag.org/content/331/6022/1279.full.html#ref-list-1
This article has been cited by 8 articles hosted by HighWire Press; see:
http://www.sciencemag.org/content/331/6022/1279.full.html#related-urls
This article appears in the following subject collections:
Psychology
http://www.sciencemag.org/cgi/collection/psychology
Science (print ISSN 0036-8075; online ISSN 1095-9203) is published weekly, except the last week in December, by the
American Association for the Advancement of Science, 1200 New York Avenue NW, Washington, DC 20005. Copyright
2011 by the American Association for the Advancement of Science; all rights reserved. The title Science is a
registered trademark of AAAS.
REVIEW
moment when all these fields converged on a
common paradigm for understanding the mind),
the labels “Bayesian” or “probabilistic” are mere-
How to Grow a Mind: Statistics, ly placeholders for a set of interrelated principles
and theoretical claims. The key ideas can be
Structure, and Abstraction thought of as proposals for how to answer three
central questions:
Joshua B. Tenenbaum,1* Charles Kemp,2 Thomas L. Griffiths,3 Noah D. Goodman4
1) How does abstract knowledge guide learn-
In coming to understand the world—in learning concepts, acquiring language, and grasping ing and inference from sparse data?
causal relations—our minds make inferences that appear to go far beyond the data available. 2) What forms does abstract knowledge take,
How do we do it? This review describes recent approaches to reverse-engineering human learning across different domains and tasks?
and cognitive development and, in parallel, engineering more humanlike machine learning 3) How is abstract knowledge itself acquired?
systems. Computational models that perform probabilistic inference over hierarchies of flexibly
structured representations can address some of the deepest questions about the nature and origins We will illustrate the approach with a focus
of human thought: How does abstract knowledge guide learning and reasoning from sparse on two archetypal inductive problems: learning
data? What forms does our knowledge take, across different domains and tasks? And how is that concepts and learning causal relations. We then
P(djh)P(h)
P(hjd) ¼ ºP(djh)P(h)
∑h′ ∈H P(djh′ )P(h′ )
ð1Þ
Data
sent P(S|F) and P(D|S), the condi-
tional probabilities that each level
specifies for the level below. A search
D
algorithm attempts to find both the
Features form F and the structure S of that form
ring x chain that jointly maximize the posterior
E probability P(S,F|D), a function of the
product of P(D|S) and P(S|F). (A) Given
chain x chain as data the features of animals, the
Mexico City Bogota Lima
algorithm finds a tree structure with
Los Angeles Santiago
intuitively sensible categories at mul-
Honolulu
Buenos tiple scales. (B) The same algorithm
Aires discovers that the voting patterns of
Chicago U.S. Supreme Court judges are best
Wellington Vancouver Toronto
Sao explained by a linear “left-right” spec-
Anchorage New
York Paulo trum. (C) Subjective similarities among
colors are best explained by a circu-
Sydney Dakar lar ring. (D) Given proximities between
Tokyo Madrid Kinshasa
cities on the globe, the algorithm dis-
Vladivostok London covers a cylindrical representation
Berlin Cape
Perth
Town analogous to latitude and longitude:
Irkutsk Moscow the cross product of a ring and a
Jakarta Nairobi
ring. (E) Given images of realistically
Manila
synthesized faces varying in two di-
Cairo
mensions, race and masculinity, the
Shanghai
Budapest algorithm successfully recovers the un-
Bangkok Teheran
Bombay derlying two-dimensional grid struc-
ture: a cross product of two chains.
if different domains of cognition are represented tics. Hierarchical Bayesian models (HBMs) (45) ing. In machine learning and artificial intelligence
in qualitatively different ways, those forms must address the origins of hypothesis spaces and priors (AI), HBMs have primarily been used for transfer
be innate (43, 44); connectionists have suggested by positing not just a single level of hypotheses learning: the acquisition of inductive constraints
these representations may be learned but in a generic to explain the data but multiple levels: hypoth- from experience in previous related tasks (46).
system of associative weights that at best only esis spaces of hypothesis spaces, with priors on Transfer learning is critical for humans as well
approximates trees, causal networks, and other forms priors. Each level of a HBM generates a proba- (SOM text and figs. S1 and S2), but here we
of structure people appear to know explicitly (14). bility distribution on variables at the level below. focus on the role of HBMs in explaining how people
Recently cognitive modelers have begun to Bayesian inference across all levels allows hypothe- acquire the right forms of abstract knowledge.
answer these challenges by combining the struc- ses and priors needed for a specific learning task to Kemp and Tenenbaum (36, 47) showed how
tured knowledge representations described above themselves be learned at larger or longer time scales, HBMs defined over graph- and grammar-based
with state-of-the-art tools from Bayesian statis- at the same time as they constrain lower-level learn- representations can discover the form of structure
s' s'
se tom
A sea mp C D
'di 'sy
1 67 16
True structure 1
6
7
Abstract 1 2 3 7 8 9 10
0.4
1 2 3 4 5 6 16 principles 4 5 6 11 12 13
14 15 16
... C
1 ... C
2
7 8 9 10 11 12 13 14 15 16
B n = 20 n = 80 n = 20 n = 80
Structure Structure
Data Data
Variables
Events
Variables
Fig. 3. HBMs defined over graph schemas can explain how intuitive theories schema discovers the disease-symptom framework theory by assigning var-
are acquired and used to learn about specific causal relations from limited iables 1 to 6 to class C1, variables 7 to 16 to class C2, and a prior favoring
data (38). (A) A simple medical reasoning domain might be described by only C1 → C2 links. These assignments, along with the effective number of
relations among 16 variables: The first six encode presence or absence of classes (here, two), are inferred automatically via the Bayesian Occam's razor.
“diseases” (top row), with causal links to the next 10 “symptoms” (bottom Although this three-level model has many more degrees of freedom than the
row). This network can also be visualized as a matrix (top right, links shown model in (B), learning is faster and more accurate. With n = 80 patients, the
in black). The causal learning task is to reconstruct this network based on causal network is identified near perfectly. Even n = 20 patients are sufficient
observing data D on the states of these 16 variables in a set of patients. (B) to learn the high-level C1 → C2 schema and thereby to limit uncertainty at the
A two-level HBM formalizes bottom-up causal learning or learning with an network level to just the question of which diseases cause which symptoms.
uninformative prior on networks. The bottom level is the data matrix D. The (D) A HBM for learning an abstract theory of causality (62). At the highest
second level (structure) encodes hypothesized causal networks: a grayscale level are laws expressed in first-order logic representing the abstract
matrix visualizes the posterior probability that each pairwise causal link properties of causal relationships, the role of exogenous interventions in
exists, conditioned on observing n patients; compare this matrix with the defining the direction of causality, and features that may mark an event as an
black-and-white ground truth matrix shown in (A). The true causal network exogenous intervention. These laws place constraints on possible directed
can be recovered perfectly only from observing very many patients (n = graphical models at the level below, which in turn are used to explain patterns
1000; not shown). With n = 80, spurious links (gray squares) are inferred, of observed events over variables. Given observed events from several different
and with n = 20 almost none of the true structure is detected. (C) A three- causal systems, each encoded in a distinct data matrix, and a hypothesis space
level nonparametric HBM (48) adds a level of abstract principles, represented by of possible laws at the highest level, the model converges quickly on a correct
a graph schema. The schema encodes a prior on the level below (causal network theory of intervention-based causality and uses that theory to constrain
structure) that constrains and thereby accelerates causal learning. Both schema inferences about the specific causal networks underlying the different systems at
and network structure are learned from the same data observed in (B). The the level below.
governing similarity in a domain. Structures of ture (the graph) of the appropriate form (Fig. tive theories (38). Mansinghka et al. (48) showed
different forms—trees, clusters, spaces, rings, 2). In particular, it can infer that a hierarchical how a graph schema representing two classes
orders, and so on—can all be represented as organization for the novel objects in Fig. 1A of variables, diseases and symptoms, and a pref-
graphs, whereas the abstract principles under- (such as Fig. 1B) better fits the similarities peo- erence for causal links running from disease to
lying each form are expressed as simple gram- ple see in these objects, compared to alternative symptom variables can be learned from the
matical rules for growing graphs of that form. representations such as a two-dimensional space. same data that support learning causal links be-
Embedded in a hierarchical Bayesian frame- Hierarchical Bayesian models can also be tween specific diseases and symptoms and be
work, this approach can discover the correct used to learn abstract causal knowledge, such learned just as fast or faster (Fig. 3, B and C).
forms of structure (the grammars) for many as the framework theory of diseases and symp- The learned schema in turn dramatically accel-
real-world domains, along with the best struc- toms (Fig. 3), and other simple forms of intui- erates learning of specific causal relations (the