Coopmans Et Al 2022 - Preprint - Rev
Coopmans Et Al 2022 - Preprint - Rev
Coopmans Et Al 2022 - Preprint - Rev
A formal comparison
Cas W. Coopmans1,2 *, Karthikeya Kaushik1,3 , & Andrea E. Martin1,3
1
Max Planck Institute for Psycholinguistics, Nijmegen, The Netherlands
2
Centre for Language Studies, Radboud University, Nijmegen, The Netherlands
3
Donders Institute for Brain, Cognition and Behaviour, Radboud University, Nijmegen, The Netherlands
Abstract
Since the cognitive revolution, language and action have been compared as cognitive systems, with cross-
domain convergent views recently gaining renewed interest in biology, neuroscience, and cognitive science.
Language and action are both combinatorial systems whose mode of combination has been argued to be
hierarchical, combining elements into constituents of increasingly larger size. This structural similarity
has led to the suggestion that they rely on shared cognitive and neural resources. In this paper, we
compare the conceptual and formal properties of hierarchy in language and action using set theory. We
show that the strong compositionality of language requires a particular formalism, a magma, to describe
the algebraic structure corresponding to the set of hierarchical structures underlying sentences. When
this formalism is applied to actions, it appears to be both too strong and too weak. To overcome these
limitations, which are related to the weak compositionality and sequential nature of action structures, we
formalize the algebraic structure corresponding to the set of actions as a trace monoid. We aim to capture
the different system properties of language and action in terms of the distinction between hierarchical sets
and hierarchical sequences, and discuss the implications for the way both systems could be represented
in the brain.
Keywords: Hierarchy, syntax, compositionality, formal modeling, set theory
*Contact details:
Cas W. Coopmans
Max Planck Institute for Psycholinguistics
Wundtlaan 1, 6525 XD Nijmegen, The Netherlands
Tel.: +31-24 3521331
Email address: [email protected]
1
1 Introduction
It has long been recognized that both language and action are structurally organized in a way that is not
immediately evident from their serial appearance. In the 1950s, Lashley (1951) and Chomsky (1959) sep-
arately showed that then dominant behaviorist “chaining” theories based on contiguous stimulus-response
associations could not account for ‘serial behavior’, such as language production and action execution. In-
stead, these behaviors appear to be controlled by internal, hierarchically organized plans, which allow human
behavior to be creative, productive and flexible. Since then, similarities between language and action have
often been noted (e.g., Corballis, 1991; Greenfield, 1991; Holloway, 1969; Miller et al., 1960), and more recent
studies propose that language and action are analogous in their hierarchical organization (Fitch & Martins,
2014; Fujita, 2014; Jackendoff, 2007; Pulvermüller & Fadiga, 2010; Stout & Chaminade, 2009).
Such proposals about cross-domain convergence are desirable from an evolutionary perspective, in which
one seeks to find a set of primitives that account for the distinguishing features of the human mind (Boeckx
& Fujita, 2014; de Waal & Ferrari, 2010; Hauser et al., 2002; Marcus, 2006). However, arguments in favor
of the analogy between language and action are formally underspecified. It is possible to draw a hierarchical
tree structure over any sequence, but what is needed is independent empirical evidence that this structure
describes or explains a phenomenon in the natural world (Berwick & Chomsky, 2017; Bloom, 1994; Fitch
& Martins, 2014; Moro, 2014a). In other words, superficial resemblance is insufficient: “we cannot just
observe that hierarchical structures are found in motor control (e.g., tool construction), and thereby claim
that these are directly related to the hierarchical structures of language ... Rather, it is necessary to develop
a functional description of the cognitive structures in question, parallel to that for language ... so we can
look for finer-scale commonalities” (Jackendoff, 2002, p. 80).
While formal linguistics has provided many accounts of the specific properties of hierarchy in language,
such a formal characterization in the domain of actions and action plans is lacking (but see Steedman, 2002
for an exception). To this end, the aim of this paper is to characterize the similarities and differences between
the hierarchical structures in language and action in both conceptual and formal terms (cf. Guest & Martin,
2021; van Rooij & Blokpoel, 2020). The paper is structured as follows: in Section 2, we discuss the type
of data that shows that the syntax of natural languages is organized hierarchically, after which we list the
core properties of such hierarchical syntactic structure (Section 2.1). In Section 2.2, we formally describe
these structures in a domain-neutral way using the mathematical language of set theory. We then show
that this formalism is inadequate for describing the action system (Section 3.1) and suggest an alternative
formalism to characterize its properties (Section 3.2). In Section 4, we conclude that the properties of
syntactic hierarchy are not found in action structures (Section 4.2) and discuss this conclusion in light of the
idea that syntactic representations are fundamentally hierarchical sets, while actions are better conceived of
as hierarchical sequences (Section 4.3). We end by discussing the implications for how language and action
might be represented in the brain.
2
The structure dependence of meaning shows that language is compositional. To be able to compare
combinatorial systems, such as language and action, we distinguish between strong and weak compositionality
(Pagin & Westerståhl, 2010). In a strongly compositional system, the meaning of a constructed unit is a
function of the meanings of its constituents and the way in which these are structurally combined (Partee et
al., 1993; Partee, 1995). In a weakly compositional system, instead, the meaning of a constructed unit is a
function of the meaning of the elements and the total construction (i.e., the result of an operation applied over
the total construction of ordered elements; Pagin & Westerståhl, 2010). A weakly compositional system can
thus distinguish the meanings of “John likes Mary” and “Mary likes John”, because their total constructions
differ. However, weakly compositional systems cannot capture structural ambiguity. Because they do not
take into account the structural relationships between intermediate representations, such as between the
different constituents in (1), they are unable to distinguish the two interpretations of “the woman saw the
man with binoculars”.
(1) a. ... b. ... c. ...
x VP x VP x VP
saw NP VP PP VP PP
did so
A second source of evidence for constituent structure is that syntactic operations, such as deletion and
substitution, target constituents rather than words or mere word sequences. For instance, the phrase did
so can substitute for a verbal word sequence, such as “saw the man”, if this sequence forms a constituent.
Because the words “saw the man” form an isolated constituent only in the structure of (1b), the sentence
“the woman saw the man with binoculars and the boy did so with field glasses” (corresponding to (1c)) can
only mean that the boy is holding the field glasses (analogous to the interpretation of (1b)), not the man. In
sum, both semantic interpretation and syntactic operations are structure-dependent: they refer to hierarchical
constituent structures rather than to linear sequences of words, with the result that word sequences that do
not form constituents are not available to semantic interpretation nor to syntactic operations.
3
is “higher” in the structure, has a structurally more prominent position than the elements α and β.
In the remainder of this paper, we will assume that the combinatorial procedure for generating structure
is formally equivalent to binary set formation. On this assumption, the hierarchical structure of syntax has
the following properties:1
1. Unbounded. Human language use is creative: language users can produce and understand sentences
that have never been produced before. Specifying such an open-ended capacity using finite means
requires recursive procedures, such as the recursive combinatorial operation defined above. Given the
controversy surrounding (the importance of) recursion and the various uses of the term in different
disciplines (Fitch, 2010; Martins, 2012; Watumull et al., 2014), we should first be explicit about the
relationship between recursion and hierarchy.
While the hierarchical structure of syntax is generated by recursion, hierarchy and recursion are two
independent properties. Hierarchy is a property of the output generated by the combinatorial operation
(i.e., a property of its extension). Recursion, instead, is a property of a function defined in intension. A
recursive function is a function which can apply indefinitely to its own output (‘self-reference’), which
leads to structurally ‘self-similar’ output, in which a unit of a specific type is contained in another unit of
the same type (in linguistics, this is often called ‘self-embedding’: embedding of one thing into another
thing of the same kind ). This results in a hierarchical structure which displays similar properties across
different levels of embedding, clearly visible in the repetition of complement clauses like He said that
she believes that he thought . . . , which is a sentence within a sentence within a sentence. Because a
recursive function is defined in intension rather than in extension, and thus the (infinite) computable
set is not the same as the (finite) set actually produced by the function, the recursivity of a function
should not be equated with its output. Therefore, absence of ‘self-similar’ output does not warrant the
conclusion that the function which generated the output is not recursive (Watumull et al., 2014; Hauser
et al., 2014), and neither does the presence of ‘self-similar’ output warrant the definitive conclusion that
it was generated via recursion (Martins, 2012).
The independence of hierarchy and recursion is further illustrated by the fact that they doubly
dissociate: not all hierarchical objects are generated by recursion and not all recursive functions generate
hierarchical structure. For instance, artificial grammars that generate sequences of the type (ab)n and
an bn can be recursive (e.g., via respectively f : S → abS and f : S → aSb), but only the latter generates
hierarchical structure2 . Conversely, the syllable structure in phonology is hierarchical but not recursive.
A syllable contains an onset and a rhyme, with the latter consisting of a nucleus and a coda. This
hierarchy is not recursive: a syllable cannot be embedded in another syllable.
2. Endocentric. The categorial status of a constituent is determined by one of its elements (the ‘head’):
the set {α, β} can be of type α or β, but not of type γ. Endocentric structures are contrasted with
exocentric structures, in which the label of a composed unit is not determined by one of its elements3 .
Labels allow phrases to be called upon by interpretive and formal procedures, thereby determining their
distributional behavior. To give an example, the set {eat, cookies} is a verb phrase, which has ‘eat-like’
(interpretative) semantic properties and ‘verb-like’ (formal) syntactic properties, both inherited from
the verb “eat”. That this is the case can be seen from the fact that “eat cookies” can take the place of
the verb “eat” in “He likes to eat”, yielding “He likes to eat cookies”. It cannot, however, take the place
of the noun “cookies” in “He likes chocolate cookies”, as is clear from the ill-formedness of “He likes
chocolate eat cookies”. The label of a composed unit thus places a constraint on further computation,
1 More properties of language can be derived from the minimal assumption that the structure-building procedure is binary
set formation (see Hornstein, 2017 and Rizzi, 2013 for comprehensive lists of properties). However, many of these properties,
such as displacement, do not have clear analogues in actions (Moro, 2015; Pulvermüller, 2014). Because our aim is to compare
the (formal) properties that language and actions might share, we focus on the properties of hierarchy listed here.
2 The grammars that generate (ab)n and an bn sequences can be implemented recursively, though they do not have be.
These sequences can also be generated with iterative functions that are not recursive, i.e., do not call themselves (Fitch, 2010;
Jackendoff, 2011). Iterative functions also realize unboundedness, but they do so by creating sequences without internal structure.
3 How it is determined which element defines the label of the phrase is still a much-debated question and is outside the scope
of this paper (see e.g., Boeckx, 2009; Chomsky, 2013; Fukui, 2011). What is important here is not how phrases get their labels,
but that they get them from one of their elements. Moreover, by using the term ‘labels’ we only refer to the fact that the
combined unit is of the same type as one of its elements. Whether these labels reflect phrasal projections from the syntactic
category of a lexical item (as in X-bar theory, Jackendoff, 1977) or rather the lexical item itself (as in bare phrase structure,
Chomsky, 1995a) is not critical for out purposes.
4
restricting the elements with which it can combine: given that {eat, cookies} is a verb phrase and not
a noun phrase, it can combine with adverbs but not with adjectives.
Endocentricity is intricately linked to recursivity, because the combinatorial operation can only be
said to apply recursively if its output is of the same type as its input (Boeckx, 2009; Hornstein, 2009;
Watumull et al., 2014). Similar to recursivity, endocentricity is a distinctive property of syntactic
hierarchy, as not all linguistic structures are endocentric.
3. Unordered. Because the combinatorial operation is defined as binary set formation, no order is
imposed on the members in the combined set. While the unordered structure has to be linearized for
spoken language production, differences in linear order do not feed differences in semantic interpretation,
and syntactic operations do not refer to linear order. Different languages (and different modalities) can
seem highly different in terms of the linear ordering of their words (e.g., whether heads precede or follow
their dependents), which is a fundamental source of cross-linguistic variation (see Section 4). However,
in terms of the compositional structure generated by Merge, which is what we are concerned with, these
languages show consistent similarities. Note that the assumption about unordereness is specific to the
definition of Merge as binary set formation, and might not be shared in other linguistic frameworks4 .
What these frameworks do agree on, however, is that syntactic operations are structure-dependent, not
order-dependent.
This conception of structure building as binary set formation allows us to derive both compositionality and
structure dependence. First, the structure of the input to the combinatorial operation is preserved in its
output. Thus, if α and β are constituents (or sets) in the input, they are constituents (or sets) in the
output as well: new elements can only be added on top of the already formed set, not inside it. Because the
structure of every combination is retained at each level of the hierarchy, the hierarchical structure is strongly
compositional. This is well illustrated with a structurally ambiguous phrase: {deep, {blue, sea}} is not the
same as {{deep, blue}, sea}. Note that if the structure were not retained after recursive combination, it
would be possible to derive from {blue, sea} not only {deep, {blue, sea}} but also {{deep, blue}, sea}. That
would make it impossible to account for the ambiguity of the phrase.
Moreover, recursively generated sets describe hierarchical relations but not linear ordering relations.
Therefore, syntactic operations that refer to these sets can only refer to its structure, and hence be structure-
dependent, but not to its order. Order-dependent operations are also ruled out by recursion: because it is
always possible to recursively insert material between two items and thereby change the linear position of the
words (e.g., “the boy swims” → “the boy with muscular arms swims”), no operation can refer to the linear
position of elements in a sequence.
We should note that the properties we described above are properties of a cognitive capacity, which can
be expressed in varying degrees in natural languages (e.g., exocentricity might be found in certain subject-
predicate relations). Moreover, the faculty of language is capable of assigning strongly compositional inter-
pretations to most sentences, as is required to derive the multiple interpretations of structurally ambiguous
sentences, but it can assign other interpretations as well (e.g., to non-decomposable idioms; Baggio, 2021;
Jackendoff, 2002). In other words, we listed properties that a model of (the faculty of) language must have,
even though these need not be found in all constructions in all languages. As we aim to illustrate how the
action system differs from the language system, we will focus on the capacity for strong compositionality as
a fundamental difference between both systems.
4 See Saito and Fukui (1998) and Kayne (2011), who argue that Merge(α,β) forms the ordered pair hα, βi. This makes
5
2.2.1 Generating structures
Definition 1. (M, ⊕, ∅) is a unital, commutative magma generated from a set W , where
1. W is the set of words that represents the lexicon of a language.
A unital, commutative magma (henceforth referred to as a magma for conciseness) is an algebraic structure
(see Box 1), whose operation we define as binary set formation following the formal definition of Merge
described in Section 2.1. This allows us to derive a number of important properties. First, as the magma
axiom states that for any two members a, b ∈ M , application of this operator to a and b generates a member
of M , thus yielding unbounded generation. Second, ⊕ does not introduce labels, so the label of each set
is derived from one of its elements (i.e., endocentricity; see Chomsky, 2013; Collins, 2017)5 . Third, all
elements in M are unordered sets. And fourth, ⊕ is non-associative, which means that the order in which ⊕
is applied affects the structure that is generated (Fukui & Zushi, 2004). In other words, the structures that
are generated are strongly compositional: their meaning is a function of the meanings of their parts and the
way in which they are structurally combined.
Box 1. Algebraic structures
An algebraic structure consists of a nonempty set X (called the carrier set), a collection of finitary
operations on X (typically binary operations), and a finite set of axioms that these operations must
satisfy. To illustrate the relevant axioms for the current work, we consider X as the set containing all
elements of the algebraic structure and as a binary operation acting on the elements of X.
Depending upon axioms they satisfy, the algebraic structures form a taxonomy. Presented below is a
subset of this taxonomy, in which we highlight both the algebraic structures that are relevant for the
current work as well as their corresponding axioms.
Unital
Commutative Commutative Trace
Magma Magma Monoid Monoid Monoid
Closed Closed Closed Closed Closed
Unital Unital Unital Unital
Commutative Associative Associative Associative
Commutative Partially
commutative
Without further constraints, a freely generated magma would contain elements that should not be con-
stituents, such as {{eat}, {happy}}. To avoid this without modifying the formal properties of ⊕, the lexical
5 Labels are a convenient way to group together structures with identical formal properties. In our formal setup, constituent
labels are simply part labels whose union produces the set of all grammatical structures. For example, with W = {dog,
S man,S big},
the label N would be the part {man, dog}, A is {big}, and N P is {{big, dog}, {big, man}}. Therefore, M = MN MA MN P .
6
items themselves must determine which structures are licensed and which are not. That is, the application of
⊕ is constrained by selectional restrictions on its input (i.e., which categories can(not) combine with which
other categories). For instance, {{eat}, {happy}} is excluded because verbs do not combine with adjectives.
The same restrictions apply when the output of ⊕ is recursively used as its input. For example, the set
{V {eat}, {cookies}} cannot combine with the adjective “happy” because the former is labeled as a type of
verb rather than as a type of noun. Such illegitimate combinations are excluded by taking the (grammatically
licensed) subset of the freely generated magma.
We make the relationship between these constituent structures explicit by defining a binary relationship
between the elements of the magma, turning it into a partially ordered magma.
Definition 2. (M, ⊕, ∅, ≤) is a partially ordered magma, where ≤ is a containment relationship between
the elements in M that is reflexive, transitive, and antisymmetric.
The relation ≤ on the set M reflects containment or set-inclusion, which corresponds to the dominance
relation commonly used in linguistics. Thus, x1 ≤ x2 means that x2 contains (and thus dominates) x1 . As
a visualization of this partially ordered magma, consider the Hasse diagram in Figure 1, which displays the
containment relationship for two structures that map onto the sequence “woman saw man with binoculars”.
Box 2. Ordered sets
An ordered set X is a set ordered by a binary relation, denoted here with infix notation ≤, such that
∀x, y, z ∈ X, the following axioms may hold (depending on the kind of order):
(1.) x ≤ x; reflexive
When the binary relation is transitive and antisymmetric, the set is called partially ordered. A totally
ordered set is an ordered set whose binary relation holds between all elements of the set. When a
relationship is only total when restricted to X 0 , which is a subset of X, we consider X 0 locally total
(Kayne, 1994). We therefore say that ∀x, y ∈ X 0 ⊂ X, x ≤ y or y ≤ x.
Besides containment, there is another relevant structural relation between the elements in constituent struc-
tures. This relationship, called c-command (Reinhart, 1983), describes the scope domain of a node in the
tree structure. Specifically, a node α is said to asymmetrically c-command a node β iff β is contained in the
sister node of α (e.g., in (1a) “saw” c-commands every node contained in the higher NP).
Definition 3. For m1 , m2 ∈ M , m1 c-commands m2 (denoted m1 m2 ) if m1 m2 , and m2 m1 , and
∃!m = {m1 , x} ∈ M , and m2 ≤ x. An asymmetric c-command relationship exists between m1 and m2 if
m1 m2 and m2 6 m1 . Asymmetric c-command is irreflexive, transitive, antisymmetric, and locally total.
Given Definition 3, asymmetric c-command is a locally total relation on non-terminal nodes in the tree
structure6 . The Hasse diagram in Figure 2 visualizes the asymmetric c-command relationship for the two
structures that map onto the sequence “woman saw man with binoculars”.
2.2.2 Sequences
Definition 4. (S, ∗, ‘’) is a monoid generated by W , where
6 Strictly speaking, the relation is left-locally total (Kayne, 1994). A left-locally total relation is total only on the elements
to the left of the relation (e.g., for aRb, R is left-locally total for a).
7
{V {woman}, {V {saw}, {N {man}, {P {with}, {binoculars}}}}}
Figure 1: Hasse diagram of a subset of the partially ordered magma M , which displays two different structures
that map onto the sequence “woman saw man with binoculars”. Arrows indicate direct containment. The
subscript at the opening curly bracket of each binary set indicates the label of that set.
{woman}
{binoculars} {binoculars}
(a.) (b.)
Figure 2: Hasse diagram of a subset of the partially ordered magma M , which displays two different structures
that map onto the sequence “woman saw man with binoculars”. Arrows indicate asymmetric c-command.
The subscript at the opening curly bracket of each binary set indicates the label of that set. The diagram in
(a) corresponds to the structure in (1a), where the man has the binoculars. The diagram in (b) corresponds
to the structure in (1b), in which the woman is holding the binoculars.
8
3. ∗ is the concatenation operation, which is unital and associative.
4. The empty sequence ‘’ is the identity element.
Definition 5. We define a binary relation (≺) on the sub-sequences in s = (x1 , x2 , ...xn ) ∈ S, which we call
precedence, where x1 ≺ x2 ≺ ...xn . Precedence is irreflexive, transitive, antisymmetric, and locally total.
Given Definition 5, precedence is a locally total relation on the set of elements in a sequence (i.e., the set of
terminal nodes in the tree structure). The Hasse diagram in Figure 3 visualizes the precedence relationship
for the sequence “woman saw man with binoculars”.
Figure 3: Hasse diagram of an element of the set of sequences S, which displays the sequential structure of
the sequence “woman saw man with binoculars”. Arrows indicate precedence.
Definition 6. We adopt the LCA as a surjective function f : M → S, defining f for a pair of lexical items
α, β ∈ m ∈ M , which holds for all elements of the sequence by induction:
if α asymmetrically c-commands β
if the mother node of α asymmetrically c-commands β
if α β and β 6 α , or if then α precedes β
α ≤ m̂ and m̂ β and β 6 m̂ , then
α ≺ β ∈ f (m)
9
the containment relation in M . Consider Figure 4, where the constituent structures in M (left panel) are
mapped to the sequences in S (middle panel). By virtue of the containment relation by which the elements
of M are ordered, this mapping imposes a structure on the set of sequences (right panel) that is not there if
only their sequential properties are considered.
f :M →S
IP VP NP s5 s8 s9 s5
NP
PP s1 s2 s3 s6 s7 s1 s2 s3 s6 s7
IP VP VP NP s4 s4
s10
--
s1 : woman saw man with binoculars s6 : man
s2 : woman s7 : with binoculars
s3 : saw man with binoculars s8 : man with
s4 : saw man s9 : saw man with
s5 : man with binoculars s10 : man man with with
Figure 4: An ordering relationship is imposed on S via the structure in M . The leftmost panel contains a
subset of the partially ordered magma M , with the elements (denoted by their labels) ordered by containment.
The labels IP, VP, NP and PP refer to the labels of the constituents. Here, IP stands for Inflectional Phrase
(whose head contains information about tense and inflection), which is often used as the top-most constituent
in the structural representation of a sentence. A subset of S is shown in the middle panel, with example
sequences presented below the figure. The LCA function f : M → S maps elements in M to elements in S,
thus imposing a structural ordering relation on the sequential elements in S (rightmost panel).
If we only consider the sequential properties of the elements in S, a partial ordering already exists. This
partial ordering is based on string containment. For example, both “woman with” and “with binoculars” can
be said to be contained in the sequence “woman with binoculars”. Using the map f : M → S, we impose
a restriction on this ordering: for two elements m1 , m2 ∈ M , f (m1 ) ≤ f (m2 ) if m1 ≤ m2 . That is, two
sequences in S are contained in one another only if their constituent structures in M are contained in one
another. This imposed ordering restricts the initial ordering by excluding both ungrammatical sequences as
well as containment relations that are not the result of a structural relationship. For example, in the middle
panel of Figure 4, the subsequences s8 , s9 , and s10 do not appear in the imposed partial ordering. s10 is an
ungrammatical sequence and therefore has no valid structural analog in M . s8 and s9 are subsequences of
a grammatical sequence, yet they do not correspond to constituents and are therefore not retained in the
ordering. Thus, only strings that correspond to constituents are retained in the partial ordering, and this
partial ordering is based on constituent containment, as can be seen in the substructure in the right panel of
Figure 4.
To sum up, we used the binary set formation operator ⊕ to generate hierarchical constituent structure.
From the resulting structure, whose containment relationships are visualized in Figure 1, we derive all c-
command relationships (see Figure 2). From these c-command relationships we derive a linear sequence with
precedence relationships using the LCA. Using the containment relationship in the partially ordered magma
(see Figure 1), we impose an ordering relation on the resulting set of sequences (see Figure 4). The latter is
possible because we define the algebraic structure corresponding to the set of structures as a magma, whose
combinatorial operator is non-associative. This allows us to generate strongly compositional structure, which
is a necessary requirement for any description of (the faculty of) language.
10
3 Hierarchical structure in actions
Having defined and formalized the properties of hierarchical structure in syntax, we will now consider whether
action hierarchies are analogous to syntactic hierarchies. Similar to the hierarchical structure underlying
sentences, action sequences are thought to be governed by hierarchically organized action plans (Botvinick,
2008; Cooper & Shallice, 2000, 2006; Holloway, 1969; Koechlin & Jubault, 2006; Lashley, 1951; Miller et al.,
1960; Rosenbaum et al., 2007)7 . This structural analogy between linguistic syntax and actions has recently
received considerable attention from several corners of cognitive science (Boeckx & Fujita, 2014; Fadiga et
al., 2009; Jackendoff, 2007, 2009; Moro, 2014a, 2014b; Stout & Chaminade, 2009), in which the hierarchical
structure of actions is thought to be generated by an action syntax (Fitch & Martins, 2014; Fujita, 2014;
Maffongelli et al., 2019; Pulvermüller, 2014).
Figure 5: An example of a hierarchical decomposition of an action sequence, such as making tea. The terminal
nodes correspond to the atomic actions in Figure 6.
The idea is often illustrated using the example of tea- or coffee-making as a goal-directed behavioral
routine (Cooper & Shallice, 2000; Fitch & Martins, 2014; Fischmeister et al., 2017; Humphreys & Forde,
1998; Jackendoff, 2007, 2009; Kuperberg, 2020). A multi-step action, such as tea-making, can be decomposed
into discrete subsequences of actions, which in turn can be decomposed in sub-subsequences, and so on. Figure
5 shows a visual representation of the hierarchical part-whole structure of ‘making tea’. The highest level in
the hierarchy represents the complex, temporally extended and goal-directed action, middle levels represent
short-term, less complex subactions with their own subgoals, and the lowest level (terminal nodes) contains
atomic actions with immediate subgoals. Decomposing complex actions into these embedded (sub)sequences
is theoretically and empirically warranted because the subsequences may be used in different tasks, because
they are sometimes unintentionally omitted, repeated, or substituted as a whole, and because they all have
their own (sub)goal, which must be fulfilled in order to achieve the overarching goal (Cooper & Shallice,
2000, 2006; Humphreys & Forde, 1998; Lashley, 1951; Norman, 1981; Reason, 1979; Rosenbaum et al., 2007;
Schwartz, 2006).
about in a processing system (Badre, 2008; Tettamanti & Moro, 2012). The latter question belongs to the study of motor
control, which is also hierarchically organized but which has different properties: motor control is based on causal relations
(‘processing’ hierarchy), while actions should be described in terms of part-whole relations (‘representational’ hierarchy; see
Uithol et al., 2012 for discussion).
11
3. ⊕ is a binary set formation operation, which is non-associative and closed.
4. ∅ is the identity element.
Figure 6: Atomic actions for tea making. (a) fill kettle with water. (b) turn on kettle. (c) put teabag in
cup. (d) pour hot water into cup. (e) open fridge. (f) grab milk. (g) pour milk into cup. In the context of
Definition 7, A = {a, b, c, d, e, f, g}.
By defining the same binary relationship as used in Definition 2, we derive a partially ordered magma in which
the actions and action sets are partially ordered by containment. A subset of this partially ordered magma
is visualized in the Hasse diagram in Figure 7, which displays the containment relationship for two struc-
tures that map onto the same action sequence for making tea. Note that the two top-most action structures
are derived in a different way. This figure illustrates a crucial point about the (ir)relevance of hierarchical
structure in the interpretation of action sequences. That is, because the ⊕ operator is non-associative, the
order in which actions are combined using ⊕ affects the structure that is generated. Therefore, if we were to
interpret these structures in a strongly compositional way, we would have to conclude that they correspond to
different actions. This is clearly an undesirable conclusion, because the two structures correspond to one and
the same action sequence. In other words, adopting a non-associative combinatorial operator for generating
action structures makes the model too strong: it will differentiate two action structures that should not be
distinguished because they map onto the same action sequence and thus achieve the same goal in effectively
the same way.
12
Figure 7: Hasse diagram of a partially ordered magma, which displays two different structures that map onto
the exact same action sequence for making tea. The boxes around action combinations represent binary sets,
and the arrows indicate direct containment.
S
S= Sg and if g 6= h, Sg ∩ Sh =∅
g∈G
Here, we take g ∈ G as the set of part labels (i.e., the labels or ‘names’ given to each element in the partition).
Because all sequences in a given part are equivalent, we call every element s ∈ Sg a representative sequence
of that part Sg .
The partitioning of S yields a set of part labels that correspond to the set of goals they accomplish. These
goals can be interpreted as abstractions over action sequences that have something in common, namely the
change they bring about in the environment (see e.g., Cooper & Shallice, 2000).
13
3.2.2 Generating structured sequences
Definition 9. We define action structure as (G, ⊗, ∗, ∅), where:
1. The elements of G are part labels (see Definition 8) corresponding to action sequences that generate a
particular goal.
2. ⊗ and ∗ are two sequence building operators that generate the elements of G, with ∅ as the identity
element.
Note that we include the set of atomic actions in G since it can be argued that an atomic action has the same
status of a goal in that it can be decomposed into smaller sub-actions, each achieving a particular change in
the environment. Therefore, an atomic action is simply an equivalence class with only one element.
A goal can often be achieved in several ways. For example, given the actions in Figure 6, the goal ‘make
black tea’ corresponds to the part Sb , where Sb = {(a, b, c, d), (a, c, b, d), (c, a, b, d)}. Here, a (‘fill kettle with
water’) must precede b (‘turn on kettle’), which in turn must precede d (‘pour hot water into cup’), so the
temporal ordering of (a, b, d) is fixed. However, the position of action c (‘put teabag into cup’) within this
action sequence is specified only in relation to d; it can be placed at any position before d within (a, b, d), thus
yielding three action sequences. In other words, for a given goal to be achieved, the temporal ordering of some
actions must be specified, whereas it need not be specified for other actions. We achieve this combination
of the requirement of strict temporal ordering with temporal flexibility via the use of two sequence building
operators.
Definition 10. ∗ is a sequence building operation ∗ : G × G → G. Let a, b ∈ G be two part labels, and
sa ∈ Sa and sb ∈ Sb be two representative sequences, where sa = (a1 , a2 , ...an ), sb = (b1 , b2 , ...bm ).
∗ concatenates two sequences
∗ is associative
aj
if i > 1 and aj−1 ∈ (c1 ...ci−1 )
ci = bj if i > 1 and bj−1 ∈ (c1 ...ci−1 )
a or b otherwise
1 1
The operator ∗ is simple concatenation. This operator is required because the temporal ordering of some
actions must be specified. For example, if h ∈ G represents the (sub)goal of obtaining hot water, then we
must define Sh = a ∗ b due to the requirement that the kettle should be filled with water (action a) before it
is turned on (action b). The temporal precedence relationship between these two actions requires an operator
that yields strict sequential orders. Clearly, concatenation is not commutative: the sequence generated by
sa ∗ sb is different from the output of sb ∗ sa . Moreover, because ∗ generates sequences whose only structural
relationship is precedence, it is associative: sa ∗ (sb ∗ sc ) = (sa ∗ sb ) ∗ sc . In sum, (G, ∗) forms a monoid, which
is an algebraic structure consisting of a set equipped with an operation that is closed, unital and associative
(see Box 1).
The operator ⊗ generates sets of sequences whose orders vary, with the only constraint that the relative
ordering within its arguments is retained. For example, for two sequences sa = (a, b, d), sb = (c), sa ⊗ sb =
{(a, b, d, c), (a, b, c, d), (a, c, b, d), (c, a, b, d)}. In each of the sequences generated by sa ⊗ sb , a precedes b, which
precedes d. So while ⊗ allows for flexibility in terms of the order of the actions in the sequences, the flexibility
14
is constrained by the sequential properties of sa , whose precedence relations must be retained in the output
of sa ⊗ sb . Because ⊗ is a sequence-building operator that is constrained only be the sequential properties
of its input (i.e., the ordering within its input arguments), ⊗ is associative. But as it does not specify the
ordering among its input arguments, ⊗ is also commutative. (G, ⊗) therefore forms a commutative monoid
(see Box 1).
The two notions of precedence and flexibility are combined in (G, ⊗, ∗), which is an algebraic structure
called a trace monoid (also called partially commutative monoid; see Box 1). A trace monoid is a monoid
of traces, which are sets of sequences that form equivalence classes (Mazurkiewicz, 1995). In (G, ⊗, ∗) the
traces contain equivalent sequences generated by ⊗ and ∗. In a trace monoid, two sequences are equivalent
if they only differ in the order of a pair of elements for which an independency relation is defined8 . These
independent elements are allowed to commute in the sequences of the equivalence class9 . Consider the
independency I = {(b, c), (c, b)}, which holds that the actions b and c are allowed to commute; no precedence
relation between them is specified. Given (b, c) ∈ I, we say that two action sequences are equivalent if they
differ only in the ordering of b and c. The trace monoid is then said to contain a trace where acb ∼ bca.
To sum up, defining the trace monoid (G, ⊗, ∗) allows us to achieve simultaneously temporal precedence
and temporal flexibility. The operator ∗ is required to build sequences where temporal precedence is necessary
(e.g., ‘grab milk’ must precede ‘pour milk into cup’), and ⊗ is used to generate action sequences whose
temporal relationship is not specified (e.g., the action ‘grab milk’ can precede or follow ‘put teabag in cup’).
The combined use of ∗ and ⊗ leads to equivalence classes of sequences that contain a mixing of intermediate
goals that are temporally independent of other intermediate goals. The mixing procedure introduced by
⊗ might destroy immediate precedence (or temporal adjacency) relationships in the output of ∗, but this
is unproblematic: while it makes sense to let ‘open fridge’ be directly followed by ‘grab milk’, this is not
necessary. One could open the fridge, perform all other tea-making preparations, and then grab the milk.
8 Independency relations are symmetric (i.e., if (a, b) is present, then so is (b, a)) and irreflexive (i.e., there are no relations
of type (a, a)), and can be extended to relations between sequences (see Mazurkiewicz, 1995).
9 Commutativity in the general sense is slightly different from the way it is used in the context of traces. In the general sense
(as used in Box 1), it refers to an operation which produces the same output if the order of the operands is changed, such as in
a ⊗ b = b ⊗ a. In the context of a trace monoid, the notion of sameness is replaced by equivalence, where a ⊗ b = {ab, ba}, and
ab 6= ba but ab ∼ ba.
15
Box 3. Inferring plans from action sequences
In order to achieve a given goal, the relative order of some related actions must be specified, whereas
that of some unrelated actions can be left undefined. Given a set of observed action sequences
that successfully reach the same goal, the abstract plan to reach that goal can be extracted via the
intersection of the sets of binary relations representing the sequences.
As a simple illustration, consider a sequence x = (a, b, c, d) consisting of non-repeating atomic
actions. The precedence relations for x are a ≺ b ≺ c ≺ d. The sequence can be represented as a set
of binary relations. If we take these binary relations to represent precedence, x will be represented
as {(a, b), (b, c), (c, d), (a, d), (a, c), (b, d)}. By observing only x, it is not immediately clear which of
these elements are dependent and which are not. However, observing the sequence y = (c, a, b, d)
(represented as y = {(c, a), (a, b), (b, d), (c, d), (c, b), (a, d)}), which achieves the same goal, provides
more information. The plan to reach the goal is represented by the intersection of the sets of binary
relations:
x ∩ y = {(a, b), (c, d), (b, d)}
This intersection corresponds to the plan of making black tea (see Figure 6). Notice how this partial
order is compatible with the previously unseen sequence (a, c, b, d), which reaches the same goal
successfully as well.
1. Unbounded. has been argued that the combinatorial operation involved in building syntactic struc-
tures evolved from pre-existing systems for tool use, also called “Action Merge” (Fujita, 2014, 2017).
This operation is thought to apply recursively (Fujita, 2017; Pulvermüller, 2014; Stout & Chaminade,
2009), even though Action Merge is bounded (Fujita, 2014). A distinctive feature of recursively gener-
ated hierarchical output is self-similarity across levels: recursively generated structures are characterized
by self-embedding of ‘tokens’ of the same ‘type’ (Martins, 2012). A first step would therefore be to
16
examine whether action hierarchies are self-similar. However, that requires that we know what the
types are. Consider the structure in Figure 5. One could combine ‘open fridge’ and ‘grab milk’ into
an action constituent, which could be labeled ‘get milk’. Here, it is unclear whether ‘get milk’ is of
the same type as ‘grab milk’. Moreover, it seems plausible that the action ‘pour hot water into cup’
is similar to ‘pour milk into cup’, but that is because the tokens are similar (both involve pouring),
not necessarily because their types are. Without a theoretical specification of the types of actions, it
cannot be determined whether actions are recursively generated.
Our primary goal is to evaluate the claim that the hierarchical structures found in language and
action are analogous. The validity of this claim rests on positive evidence that actions, like language,
are recursively generated. In the absence of such evidence, it is premature to conclude that actions are
structurally analogous to language.
2. Endocentric. Some of the hierarchical representations of actions that are proposed in the literature
contain action constituents with one key element, or ‘head’, which performs the core of the action
and determines its (end)goal (e.g., Jackendoff, 2007, 2009, 2011; Fischmeister et al., 2017). While this
makes the structures ‘headed’, it does not make them endocentric. That is, it seems that this head
merely serves to describe the main action of the action sequence, rather than to provide a label for
the constituent it is dominated by. In Figure 5, for instance, the action constituent formed by the
combination of ‘open fridge’ and ‘grab milk’ is not a type of either of these actions. In line with the
idea that endocentricity is unique to language (Boeckx, 2009; Hornstein, 2009), action hierarchies seem
to be exocentric.
A plausible reason for the difficulty in assigning labels to action constituents is that actions do
not have clear conceptual units, such as words (Moro, 2014a; Berwick et al., 2011), and that groups
of actions do not obligatorily fall into a closed set of distinguishable categories, such as NP or VP
(Jackendoff & Pinker, 2005). Without these categories, groupings of actions into constituents cannot
be labeled or ‘syntactically named’, which means that there are no grammatical constraints on how the
resulting constituents can be used in further combinations.
3. Unordered. Representations of actions are intimately tied to the physical environment in which the
actions are performed (Graves, 1994; Kuperberg, 2020; Moro, 2014a; Zaccarella et al., 2021). As
such, they are not order-independent: some sub-actions must precede others in order for the action to
achieve its goal (Fitch & Martins, 2014), and indeed, the output of Action Merge is inherently ordered
(Fujita, 2014)10 . Comparing this to language, we see that the externalization of spoken language is also
sequential, but that sequential order does not play a role in the representation of syntactic relations,
which are invariably structure-dependent.
It has been proposed that closely related actions, which can be separated by arbitrarily many ‘em-
bedded’ actions (e.g., [open door [switch on light [brush teeth] switch off light] close door]), are similar
to long-distance dependencies in language (Pulvermüller & Fadiga, 2010; Pulvermüller, 2014). This
analogy is incorrect, however, because long-distance dependencies are related to the hierarchical organi-
zation of constituent structure. These action dependencies, instead, have serial and temporal properties:
you cannot close a door before having opened it (Dominey et al., 2003; Moro, 2015; Zaccarella et al.,
2021). If they were truly hierarchical, the embedded action would be expected to adhere to structural
restrictions on its distribution, which would be the case if the embedding of [brush teeth] at a different
position, like in [open door [brush teeth] [switch on light switch off light] close door], were not allowed.
Moreover, if the dependency were hierarchical, it should not be affected by linearly or temporally in-
tervening actions, so whatever happens during [brush teeth] should not be able to affect the action
[switch off light]. As neither appears to be the case, it is more appropriate to label the dependency
between two actions temporal (or causal) rather than hierarchical (Moro, 2014b, 2015). Indeed, actions
and events can be understood in terms of temporal (and causal) structure (McRae et al., 2019; Zacks
& Tversky, 2001), and oddly ordered complex actions, which are thought of as ungrammatical actions
10 Even under an analysis in which immediate precedence plays a role in syntax (as in Kayne, 2011), the crucial difference
between language and actions remains: if two linguistic objects α and β are not adjacent in their base-generated position
(i.e., they do not form the ordered pair hα, βi), their relationship is defined as a relationship that refers to the (hierarchical)
constituents they are contained in, not as a relationship that refers to their linear or temporal order. There is no such constraint
in actions, where some actions must precede (distant) others, regardless of the relationship between the action constituents in
which they are contained.
17
(e.g., Maffongelli et al., 2019), reflect the violation of ‘temporal rules’ rather than phrase-structure rules
(Zaccarella et al., 2021).
A plausible reason for the observation that none of the properties of hierarchy in syntax are found in ac-
tions is that the analogy between language and action is not to be found in syntactic structure, but rather
in conceptual structure (Jackendoff, 2007; Zaccarella et al., 2021). An important difference is that syntax
is computationally autonomous, having its own principles and properties that cannot be reduced to other
factors, such as meaning (Adger, 2018; Berwick, 2018; Chomsky, 1957). The application of these principles
is constrained by economy conditions (e.g., locality, minimality; see Collins, 2001), but not by whether they
generate interpretable output. Therefore, in language there is an independent notion of grammaticality:
sentences are ungrammatical if their structures cannot be generated by the rules of syntax, or if they violate
conditions on these rules. One way to illustrate this is by means of interpretable but nevertheless ungram-
matical sentences. A sentence such as “which boy did they meet the girl who insulted?” is ungrammatical but
can be interpreted (i.e., corresponding to the logical statement “for which x, x a boy, did they meet the girl
who insulted x?”). Its deviance is due to the violation of a purely formal (locality) principle constraining the
grammar, which is unrelated to its semantic interpretability. Conversely, the sentence “colorless green ideas
sleep furiously” is semantically odd, yet fully grammatical (Adger, 2018; Berwick, 2018; Chomsky, 1957),
showing that grammaticality does not boil down to meaningfulness or interpretability.
In contrast, the validity of action sequences seems related to their coherence, in terms of both logical con-
sistency and environmental appropriateness. It has been suggested that a complex action is ‘ungrammatical’
or ‘ill-formed’ if its sub-parts are ordered in such a way that the action’s overall goal cannot be achieved
(Jackendoff, 2007; Maffongelli et al., 2019). The ‘grammaticality’ of an action is thus intimately tied to
the fulfillment of its goal, showing that the notion ‘ungrammatical’ is very different for action sequences
and sentences (Graves, 1994; Zaccarella et al., 2021). On this interpretation, an ‘ungrammatical’ action is
similar to a sentence which does not convey the intended meaning, either because it is logically incoherent
or because it is situationally inappropriate. The action equivalent of a logically incoherent sentence could be
an action sequence in which a coffee grinder is turned on before the coffee beans are added. This is logically
incoherent because it violates causality principles of the physical environment. An action like turning off the
light when walking into your office during nighttime, instead, does not violate such constraints, but it would
be situationally inappropriate because it would preclude you from seeing anything (e.g., Reason, 1979).
Because there is no autonomous action syntax, there is no independent notion of grammaticality, devoid of
goal-dependent meaning. As a result, it is unclear how to evaluate whether a given structural decomposition
of complex actions into constituents is veridical if we do not know the goal or general conceptual content
of the action (Berwick & Chomsky, 2017; Jackendoff, 2007). It seems that the decomposition of an action
sequence into a hierarchical tree structure only works to the extent that the subactions are meaningful or
coherent (i.e., represent subgoals).
organization’ in terms of sequential vs. internal hierarchy, describing respectively the computation of sequential hierarchical
information (externalized) and the computation of non-linear hierarchical relations (mind-internal).
18
as whether heads precede or follow their dependents (e.g., English vs. Japanese). Hierarchical sets are
further abstractions from these hierarchical sequences, which are not realized in the physical properties of
the linguistic signal (left panel, top row in Figure 8). This level is explanatorily relevant for syntactic theory
because it naturally captures the properties of syntax described in Section 2.1: hierarchical sets generated by
Merge are unbounded, endocentric, and unordered (Lasnik, 2000). Therefore, at this level, we can account
for both structural relations within languages and structural generalizations between languages (e.g., the
head-dependent relations in English and Japanese are identical at the level of hierarchical sets).
Figure 8: Three representational levels to describe hierarchically structured sequential information. The
levels become increasingly concrete, from what is abstractly represented (on the left) to what is physically
observed (on the right). The symbol at each node in the hierarchical structures indicates which operator
was used to combine the elements, thus representing the derivational history of the sets (left) or sequences
(middle).
As we noted in Section 3.2.3, actions might be seen as ordered sequences of events that are hierarchically
structured (middle panel, bottom row of Figure 8). However, the structural properties of action sequences
cannot be described in a way which has completely abstracted away from the physical instantiation of the
action sequence, most clearly because the (representation of the) hierarchical structure of action sequences
contains information about temporal order. Because the properties of syntactic structure are not found in
actions, we do not have to postulate hierarchical sets as an explanatorily or functionally relevant level of
abstraction for actions.
The distinction between hierarchical sets and sequences is useful in explaining why it has been found that
the brain areas involved in language processing (in particular, Brodmann’s area 44 in the left inferior frontal
gyrus) are also activated in response to tasks involving hierarchically organized actions (e.g., Higuchi et al.,
2009; Koechlin & Jubault, 2006). These results support the idea that there is a supramodal hierarchical
processor in the brain which processes the hierarchical structures of cognitive systems such as language
and action (Fadiga et al., 2009; Fazio et al., 2009; Fiebach & Schubotz, 2006; Higuchi et al., 2009; Jeon,
2014; Koechlin & Jubault, 2006; Tettamanti & Weniger, 2006). Crucially, however, instead of describing
complex actions in terms of non-linear relations defined over hierarchical structures, these accounts refer
to the processing of sequences which happen to have hierarchical structure (for a related discussion, see
Martins et al., 2019 and Zaccarella et al., 2021). The overlapping activation patterns for language and action
might therefore point not to shared brain regions processing hierarchical, non-linear relations (operating over
hierarchical sets), but rather to shared brain regions implicated in the linearization of hierarchically structured
information (i.e., hierarchical sequences; see also Boeckx et al., 2014; Matchin & Hickok, 2020; Uddén &
Bahlmann, 2012). To account for the former process, a reconceptualization of how linguistic representations
might be coded in neural systems is likely needed (Martin, 2016, 2020; Martin & Doumas, 2017; Meyer et al.,
19
2019). We believe that a fruitful avenue for further investigation into the relationship between language and
action concerns the externalization of hierarchically organized information into structured sequences rather
than the (generation of the) hierarchical structure itself. The overlap between language and action then has
to do with the fact that, externally, both are structured sequences, even though their internal structures are
quite different.
5 Conclusions
In response to the claim that language and action are analogous because they are both organized hierarchi-
cally, we argued in this paper that the formal properties of hierarchy in both domains are fundamentally
different. Our main argument is that the language system can embody strong compositionality, as both
syntactic rules and semantic interpretation are structure-dependent. Structural analyses in language are
thus concerned with non-terminal nodes in the hierarchical structure of syntax. Actions, instead, are weakly
compositional: regularities in action structures are dependent on the temporal order of the atomic actions,
not on their hierarchical organization into action goals. Analyses of actions are thus concerned with terminal
nodes in the action hierarchy. Based on this difference, we argue that the structure of syntax is best described
as a system of hierarchical sets, whereas action structures can be described as hierarchical sequences.
In order to formally capture the strong compositionality of language, we described the algebraic structure
corresponding to the ordered set of hierarchical structures in language as a magma, whose non-associative
combinatorial operator was defined as binary set formation. This set-based formalism integrates the three
properties of syntactic structure (i.e., unboundedness, endocentricity, and unorderedness) with the description
of syntax as a system of hierarchical sets and the fact that language exhibits strong compositionality. When
this model was applied to actions, it appeared to be both too strong (i.e., it makes structural distinctions
which should not be made) and too weak (i.e., it does not capture the importance of temporal precedence).
We therefore proposed an alternative model for actions, which used two sequence building operators that
organize actions by sequential relations. This yielded an ordered set of action structures that could be
described as a trace monoid. The associativity of the two operators illustrates how actions exhibit a weaker
form of compositionality, which is based on sequential rather than hierarchical structure. This aligns well
with our argument that actions are best described in terms of hierarchical sequences. In sum, the formal
tools needed to describe language are fundamentally different from those required to the describe the action
system. We believe that this result has important implications not only for comparative cognitive science
but also for cognitive neuroscience, as it points to differences in the ways in which hierarchies are represented
in the brain.
6 Glossary
• Compositionality: Property of a system which holds that the meaning of a complex expression is built
up from the meanings of its parts and the way in which they are combined.
• C-command: Structural relation between nodes in a hierarchical structure. Node α c-commands a node
β iff β is (contained in) the sister node of α.
• Endocentricity: Property of syntactic structures. A structural combination is endocentric if it fulfills
the same grammatical function as one of its parts. Endocentric structures are contrasted with exocentric
structures, in which the grammatical function of the combination is not the same as that of its parts.
• Equivalence class: A subset (of a set) containing elements that are related under the equivalence relation.
An equivalence class is denoted by square brackets: [a] = {b|b ∼ a}.
• Equivalence relation: An equivalence relation on a set is a binary relation that is reflexive, symmetric,
and transitive. An equivalence relation is indicated by ∼, so a ∼ b denotes that a is equivalent to b.
• Hasse diagram: A mathematical diagram used to visualize an ordered set.
• Linear Correspondence Axiom: An algorithm that maps hierarchical structure onto linear order using
the c-command relation.
20
• Merge: Combinatorial procedure to generate syntactic structure, formally defined as binary set formation:
Merge(α,β) = {α, β}.
• Magma: An algebraic structure consisting of a set equipped with a binary operation which is closed and
not associative.
• Monoid: An algebraic structure consisting of a set equipped with a binary operation that is closed, unital
and associative.
• Partial order: A set with a binary relation that is transitive and antisymmetric.
• Partition: A partition of a set S is a set of non-empty subsets of S that cover S and are pairwise disjoint.
• Recursion: Property of a function, which lets it take its output as its input.
• Surjective: Mathematical property of a function f : X → Y which holds that for every element y ∈ Y
there exists at least one element x ∈ X such that f (x) = y.
• Total order: A set with a binary relation that is transitive, antisymmetric, and total.
7 Acknowledgements
We thank Giosuè Baggio, Cedric Boeckx, Helen de Hoop, and Bob van Tiel for helpful comments on earlier
versions of this work. We also thank Steven Phillips for help with aspects of the formal work. For annotating
the equations in our formal work we used the code from https://github.com/synercys/annotated latex
equations. Andrea E. Martin was supported by a Max Planck Research Group and a Lisa Meitner Research
Group “Language and Computation in Neural Systems” from the Max Planck Society and by the Netherlands
Organisation (NWO) for Scientific Research (grant 016.Vidi.188.029).
21
References
Adger, D. (2018). The autonomy of syntax. In N. Hornstein, H. Lasnik, P. Patel-Grosz, & C. Yang (Eds.),
Syntactic Structures after 60 Years: The Impact of the Chomskyan Revolution in Linguistics (pp.
153–175). De Gruyter Mouton.
Badre, D. (2008). Cognitive control, hierarchy, and the rostro–caudal organization of the frontal lobes.
Trends in Cognitive Sciences, 12 (5), 193–200. doi: 10.1016/j.tics.2008.02.004
Baggio, G. (2021). Compositionality in a parallel architecture for language processing. Cognitive Science,
45 (5), e12949. doi: 10.1111/cogs.12949
Berwick, R. C. (2018). Revolutionary new ideas appear infrequently. In N. Hornstein, H. Lasnik, P. Patel-
Grosz, & C. Yang (Eds.), Syntactic Structures after 60 Years: The Impact of the Chomskyan Revolution
in Linguistics (pp. 177–194). De Gruyter Mouton.
Berwick, R. C., & Chomsky, N. (2017). Why only us: Recent questions and answers. Journal of Neurolin-
guistics, 43 , 166–177. doi: 10.1016/j.jneuroling.2016.12.002
Berwick, R. C., Okanoya, K., Beckers, G. J. L., & Bolhuis, J. J. (2011). Songs to syntax: The linguistics of
birdsong. Trends in Cognitive Sciences, 15 (3), 113–121. doi: 10.1016/j.tics.2011.01.002
Bloom, P. (1994). Generativity within language and other cognitive domains. Cognition, 51 (2), 177–189.
doi: 10.1016/0010-0277(94)90014-0
Boeckx, C. (2009). The nature of Merge: Consequences for language, mind, and biology. In M. Piattelli-
Palmarini, J. Uriagereka, & P. Salaburu (Eds.), Of Minds and Language: A Dialogue with Noam
Chomsky in the Basque Country (pp. 44–57). Oxford: Oxford University Press.
Boeckx, C., & Fujita, K. (2014). Syntax, action, comparative cognitive science, and Darwinian thinking.
Frontiers in Psychology, 5 . doi: 10.3389/fpsyg.2014.00627
Boeckx, C., Martinez-Alvarez, A., & Leivada, E. (2014). The functional neuroanatomy of serial order in
language. Journal of Neurolinguistics, 32 , 1–15. doi: 10.1016/j.jneuroling.2014.07.001
Botvinick, M. M. (2008). Hierarchical models of behavior and prefrontal function. Trends in Cognitive
Sciences, 12 (5), 201–208. doi: 10.1016/j.tics.2008.02.009
Chomsky, N. (1957). Syntactic Structures. The Hague: Mouton de Gruyter.
Chomsky, N. (1959). A Review of B. F. Skinner’s Verbal Behavior. Language, 35 (1), 26–58. doi: 10.2307/
411334
Chomsky, N. (1995a). Bare phrase structure. In H. R. Campos & P. M. Kempchinsky (Eds.), Evolution
and Revolution in Linguistic Theory: Essays in Honor of Carlos Otero (pp. 51–109). Washington, DC:
Georgetown University Press.
Chomsky, N. (1995b). The Minimalist Program. Cambridge, MA: MIT Press.
Chomsky, N. (2013). Problems of projection. Lingua, 130 , 33–49. doi: 10.1016/j.lingua.2012.12.003
Collins, C. (2001). Economy conditions in syntax. In M. Baltin & C. Collins (Eds.), The Handbook of
Contemporary Syntactic Theory (pp. 45–61). Oxford: Blackwell Publishers.
Collins, C. (2017). Merge(X,Y) = {X,Y}. In L. Bauke & A. Blümel (Eds.), Labels and Roots (pp. 47–68).
De Gruyter Mouton. doi: 10.1515/9781501502118-003
Cooper, R. P., & Shallice, T. (2000). Contention scheduling and the control of routine activities. Cognitive
Neuropsychology, 17 (4), 297–338. doi: 10.1080/026432900380427
Cooper, R. P., & Shallice, T. (2006). Hierarchical schemas and goals in the control of sequential behavior.
Psychological Review , 113 (4), 887. doi: 10.1037/0033-295X.113.4.887
Corballis, M. C. (1991). The Lopsided Ape: Evolution of the Generative Mind. Oxford: Oxford University
Press.
de Waal, F. B. M., & Ferrari, P. F. (2010). Towards a bottom-up perspective on animal and human cognition.
Trends in Cognitive Sciences, 14 (5), 201–207. doi: 10.1016/j.tics.2010.03.003
Dominey, P. F., Hoen, M., Blanc, J.-M., & Lelekov-Boissard, T. (2003). Neurological basis of language and
sequential cognition: Evidence from simulation, aphasia, and ERP studies. Brain and Language, 86 (2),
207–225. doi: 10.1016/S0093-934X(02)00529-1
Everaert, M. B. H., Huybregts, M. A. C., Chomsky, N., Berwick, R. C., & Bolhuis, J. J. (2015). Structures,
not strings: Linguistics as part of the cognitive sciences. Trends in Cognitive Sciences, 19 (12), 729–743.
doi: 10.1016/j.tics.2015.09.008
22
Fadiga, L., Craighero, L., & D’Ausilio, A. (2009). Broca’s Area in language, action, and music. Annals of
the New York Academy of Sciences, 1169 (1), 448–458. doi: 10.1111/j.1749-6632.2009.04582.x
Fazio, P., Cantagallo, A., Craighero, L., D’Ausilio, A., Roy, A. C., Pozzo, T., . . . Fadiga, L. (2009). Encoding
of human action in Broca’s area. Brain, 132 (7), 1980–1988. doi: 10.1093/brain/awp118
Fiebach, C. J., & Schubotz, R. I. (2006). Dynamic anticipatory processing of hierarchical sequential events:
A common role for broca’s area and ventral premotor cortex across domains? Cortex , 42 (4), 499–502.
doi: 10.1016/S0010-9452(08)70386-1
Fischmeister, F. P., Martins, M. J. D., Beisteiner, R., & Fitch, W. T. (2017). Self-similarity and recursion
as default modes in human cognition. Cortex , 97 , 183–201. doi: 10.1016/j.cortex.2016.08.016
Fitch, W. T. (2010). Three meanings of “recursion”: Key distinctions for biolinguistics. In H. Yamakido,
R. K. Larson, & V. Déprez (Eds.), The Evolution of Human Language: Biolinguistic Perspectives (pp.
73–90). Cambridge: Cambridge University Press. doi: 10.1017/CBO9780511817755.005
Fitch, W. T., & Martins, M. D. (2014). Hierarchical processing in music, language, and action: Lashley
revisited. Annals of the New York Academy of Sciences, 1316 (1), 87–104. doi: 10.1111/nyas.12406
Fujita, K. (2014). Recursive Merge and human language evolution. In T. Roeper & M. Speas (Eds.),
Recursion: Complexity in Cognition (pp. 243–264). Springer.
Fujita, K. (2017). On the parallel evolution of syntax and lexicon: A Merge-only view. Journal of Neurolin-
guistics, 43 , 178–192. doi: 10.1016/j.jneuroling.2016.05.001
Fukui, N. (2011). Merge and Bare phrase structure. In C. Boeckx (Ed.), The Oxford Handbook of Linguistic
Minimalism. Oxford: Oxford University Press. doi: 10.1093/oxfordhb/9780199549368.013.0004
Fukui, N., & Zushi, M. (2004). Introduction. In N. Chomsky (Ed.), The Generative Enterprise Revisited:
Discussions with Riny Huybregts, Henk van Riemsdijk, Naoki Fukui and Mihoko Zushi (pp. 1–25).
Berlin: Walter de Gruyter.
Graves, P. (1994). Flakes and ladders: What the archaeological record cannot tell us about the origins of
language. World Archaeology, 26 (2), 158–171. doi: 10.1080/00438243.1994.9980270
Greenfield, P. M. (1991). Language, tools and brain: The ontogeny and phylogeny of hierarchically organized
sequential behavior. Behavioral and Brain Sciences, 14 (4), 531–551. doi: 10.1017/S0140525X00071235
Guest, O., & Martin, A. E. (2021). How computational modeling can force theory building in psychological
science. Perspectives on Psychological Science, 1–14. doi: 10.1177/1745691620970585
Hauser, M. D., Chomsky, N., & Fitch, W. T. (2002). The faculty of language: What is it, who has it, and
how did it evolve? Science, 298 (5598), 1569–1579. doi: 10.1126/science.298.5598.1569
Hauser, M. D., Yang, C., Berwick, R. C., Tattersall, I., Ryan, M. J., Watumull, J., . . . Lewontin, R. C. (2014).
The mystery of language evolution. Frontiers in Psychology, 5 . doi: 10.3389/fpsyg.2014.00401
Higuchi, S., Chaminade, T., Imamizu, H., & Kawato, M. (2009). Shared neural correlates for language and
tool use in Broca’s area. NeuroReport, 20 (15), 1376–1381. doi: 10.1097/WNR.0b013e3283315570
Holloway, R. L. (1969). Culture: A human domain. Current Anthropology, 10 , 395–412.
Hornstein, N. (2009). A Theory of Syntax: Minimal Operations and Universal Grammar. Cambridge:
Cambridge University Press.
Hornstein, N. (2017). On Merge. In J. McGilvray (Ed.), The Cambridge Companion to Chomsky (Second
ed., pp. 69–86). Cambridge: Cambridge University Press. doi: 10.1017/9781316716694.004
Humphreys, G. W., & Forde, E. M. E. (1998). Disordered action schema and action disorganisation syndrome.
Cognitive Neuropsychology, 15 (6/7/8), 771–811.
Jackendoff, R. (1977). X’ Syntax: A Theory of Phrase Structure. Cambridge, MA: MIT Press.
Jackendoff, R. (2002). Foundations of Language: Brain, Meaning, Grammar, Evolution. Oxford: Oxford
University Press.
Jackendoff, R. (2007). Language, Consciousness, Culture: Essays on Mental Structure. Cambridge, MA:
MIT Press.
Jackendoff, R. (2009). Parallels and nonparallels between language and music. Music Perception, 26 (3),
195–204. doi: 10.1525/mp.2009.26.3.195
Jackendoff, R. (2011). What is the human language faculty? Two views. Language, 87 (3), 586–624.
Jackendoff, R., & Pinker, S. (2005). The nature of the language faculty and its implications for evolution of
language (Reply to Fitch, Hauser, and Chomsky). Cognition, 97 (2), 211–225. doi: 10.1016/j.cognition
.2005.04.006
23
Jeon, H.-A. (2014). Hierarchical processing in the prefrontal cortex in a variety of cognitive domains. Frontiers
in Systems Neuroscience, 8 . doi: 10.3389/fnsys.2014.00223
Joshi, A. K., & Schabes, Y. (1997). Tree-Adjoining Grammars. In G. Rozenberg & A. Salomaa (Eds.),
Handbook of Formal Languages: Volume 3 Beyond Words (pp. 69–123). Berlin, Heidelberg: Springer.
doi: 10.1007/978-3-642-59126-6 2
Kayne, R. S. (1994). The Antisymmetry of Syntax. Cambridge, MA: MIT Press.
Kayne, R. S. (2011). Why are there no directionality parameters? In Proceedings of WCCFL (Vol. 28, pp.
1–23).
Koechlin, E., & Jubault, T. (2006). Broca’s area and the hierarchical organization of human behavior.
Neuron, 50 (6), 963–974. doi: 10.1016/j.neuron.2006.05.017
Kuperberg, G. R. (2020). Tea with milk? A hierarchical generative framework of sequential event compre-
hension. Topics in Cognitive Science, 13 (1), 256–298. doi: 10.1111/tops.12518
Lashley, K. S. (1951). The problem of serial order in behavior. In L. A. Jeffress (Ed.), Cerebral mechanisms
in behavior (pp. 112–131). New York, NY: Wiley.
Lasnik, H. (2000). Syntactic Structures Revisited: Contemporary Lectures on Classic Transformational
Theory. Cambridge, MA: MIT Press.
Maffongelli, L., D’Ausilio, A., Fadiga, L., & Daum, M. M. (2019). The ontogenesis of action syntax. Collabra:
Psychology, 5 (1), 21. doi: 10.1525/collabra.215
Marcus, G. F. (2006). Cognitive architecture and descent with modification. Cognition, 101 (2), 443–465.
doi: 10.1016/j.cognition.2006.04.009
Martin, A. E. (2016). Language processing as cue integration: Grounding the psychology of language in
perception and neurophysiology. Frontiers in Psychology, 7 . doi: 10.3389/fpsyg.2016.00120
Martin, A. E. (2020). A compositional neural architecture for language. Journal of Cognitive Neuroscience,
32 (8), 1407–1427. doi: 10.1162/jocn a 01552
Martin, A. E., & Doumas, L. A. A. (2017). A mechanism for the cortical computation of hierarchical linguistic
structure. PLOS Biology, 15 (3), e2000663. doi: 10.1371/journal.pbio.2000663
Martins, M. D. (2012). Distinctive signatures of recursion. Philosophical Transactions of the Royal Society
B: Biological Sciences, 367 (1598), 2055–2064. doi: 10.1098/rstb.2012.0097
Martins, M. D., Bianco, R., Sammler, D., & Villringer, A. (2019). Recursion in action: An fMRI study on
the generation of new hierarchical levels in motor sequences. Human Brain Mapping, 40 (9), 2623–2638.
doi: 10.1002/hbm.24549
Matchin, W., & Hickok, G. (2020). The cortical organization of syntax. Cerebral Cortex , 30 (3), 1481–1498.
doi: 10.1093/cercor/bhz180
Mazurkiewicz, A. (1995). Introduction to Trace Theory. In V. Diekert & G. Rozenberg (Eds.), The Book of
Traces (pp. 3–41). Singapore: World Scientific. doi: 10.1142/9789814261456 0001
McRae, K., Brown, K. S., & Elman, J. L. (2019). Prediction-based learning and processing of event knowledge.
Topics in Cognitive Science, 13 (1), 206–223. doi: 10.1111/tops.12482
Meyer, L., Sun, Y., & Martin, A. E. (2019). Synchronous, but not entrained: Exogenous and endogenous
cortical rhythms of speech and language processing. Language, Cognition and Neuroscience, 35 (9),
1089–1099. doi: 10.1080/23273798.2019.1693050
Miller, G. A., Galanter, E., & Pribram, K. H. (1960). Plans and the Structure of Behavior. New York, NY:
Holt, Rinehart and Winston.
Moro, A. (2014a). On the similarity between syntax and actions. Trends in Cognitive Sciences, 18 (3),
109–110. doi: 10.1016/j.tics.2013.11.006
Moro, A. (2014b). Response to Pulvermüller: The syntax of actions and other metaphors. Trends in Cognitive
Sciences, 18 (5), 221. doi: 10.1016/j.tics.2014.01.012
Moro, A. (2015). The Boundaries of Babel: The Brain and the Enigma of Impossible Languages. Cambridge,
MA: MIT Press.
Norman, D. A. (1981). Categorization of action slips. Psychological Review , 88 (1), 1–15. doi: 10.1037/
0033-295X.88.1.1
O’Donnell, T. J., Hauser, M. D., & Fitch, W. T. (2005). Using mathematical models of language experimen-
tally. Trends in Cognitive Sciences, 9 (6), 284–289. doi: 10.1016/j.tics.2005.04.011
Pagin, P., & Westerståhl, D. (2010). Compositionality I: Definitions and Variants. Philosophy Compass,
5 (3), 250–264. doi: 10.1111/j.1747-9991.2009.00228.x
24
Papitto, G., Friederici, A. D., & Zaccarella, E. (2020). The topographical organization of motor processing:
An ALE meta-analysis on six action domains and the relevance of Broca’s region. NeuroImage, 206 ,
116321. doi: 10.1016/j.neuroimage.2019.116321
Partee, B. H. (1995). Lexical semantics and compositionality. In L. R. Gleitman & M. Liberman (Eds.),
Language: An invitation to cognitive science, Vol. 1, 2nd ed (pp. 311–360). Cambridge, MA, US: The
MIT Press.
Partee, B. H., ter Meulen, A., & Wall, R. E. (1993). Mathematical methods in linguistics (Vol. 30). Dordrecht:
Kluwer Academic Publishers.
Pulvermüller, F. (2014). The syntax of action. Trends in Cognitive Sciences, 18 (5), 219–220. doi: 10.1016/
j.tics.2014.01.001
Pulvermüller, F., & Fadiga, L. (2010). Active perception: Sensorimotor circuits as a cortical basis for
language. Nature Reviews Neuroscience, 11 (5), 351–360. doi: 10.1038/nrn2811
Reason, J. T. (1979). Actions not as planned. In G. Underwood & R. Stevens (Eds.), Aspects of Consciousness
(pp. 67–90). London: Academic Press.
Reinhart, T. (1983). Anaphora and Semantic Interpretation. London: Croom Helm.
Rizzi, L. (2004). On the study of the language faculty: Results, developments, and perspectives. The
Linguistic Review , 21 (3-4), 323–344. doi: 10.1515/tlir.2004.21.3-4.323
Rizzi, L. (2013). Introduction: Core computational principles in natural language syntax. Lingua, 130 , 1–13.
doi: 10.1016/j.lingua.2012.12.001
Rosenbaum, D. A., Cohen, R. G., Jax, S. A., Weiss, D. J., & van der Wel, R. (2007). The problem of
serial order in behavior: Lashley’s legacy. Human Movement Science, 26 (4), 525–554. doi: 10.1016/
j.humov.2007.04.001
Saito, M., & Fukui, N. (1998). Order in phrase structure and movement. Linguistic Inquiry, 29 (3), 439–474.
doi: 10.1162/002438998553815
Schwartz, M. F. (2006). The cognitive neuropsychology of everyday action and planning. Cognitive Neu-
ropsychology, 23 (1), 202–221. doi: 10.1080/02643290500202623
Steedman, M. (2000). The Syntactic Process. Cambridge, MA: MIT press.
Steedman, M. (2002). Plans, affordances, and Combinatory Grammar. Linguistics and Philosophy, 25 (5),
723–753. doi: 10.1023/A:1020820000972
Stout, D., & Chaminade, T. (2009). Making tools and making sense: Complex, intentional behaviour in
human evolution. Cambridge Archaeological Journal , 19 (1), 85–96. doi: 10.1017/S0959774309000055
Tettamanti, M., & Moro, A. (2012). Can syntax appear in a mirror (system)? Cortex , 48 (7), 923–935. doi:
10.1016/j.cortex.2011.05.020
Tettamanti, M., & Weniger, D. (2006). Broca’s area: A supramodal hierarchical processor? Cortex , 42 (4),
491–494. doi: 10.1016/S0010-9452(08)70384-8
Uddén, J., & Bahlmann, J. (2012). A rostro-caudal gradient of structured sequence processing in the left
inferior frontal gyrus. Philosophical Transactions of the Royal Society B: Biological Sciences, 367 (1598),
2023–2032. doi: 10.1098/rstb.2012.0009
Uithol, S., van Rooij, I., Bekkering, H., & Haselager, P. (2012). Hierarchies in action and motor control.
Journal of Cognitive Neuroscience, 24 (5), 1077–1086. doi: 10.1162/jocn a 00204
van Rooij, I., & Blokpoel, M. (2020). Formalizing verbal theories. Social Psychology, 51 (5), 285–298. doi:
10.1027/1864-9335/a000428
Watumull, J., Hauser, M. D., Roberts, I. G., & Hornstein, N. (2014). On recursion. Frontiers in Psychology,
4 . doi: 10.3389/fpsyg.2013.01017
Zaccarella, E., Papitto, G., & Friederici, A. D. (2021). Language and action in Broca’s area: Computational
differentiation and cortical segregation. Brain and Cognition, 147 , 105651. doi: 10.1016/j.bandc.2020
.105651
Zacks, J. M., & Tversky, B. (2001). Event structure in perception and conception. Psychological Bulletin,
127 (1), 3. doi: 10.1037/0033-2909.127.1.3
25