Sag Ivan
Sag Ivan
Sag Ivan
Abstract
This paper presents a theory of syntactically flexible phrasal idioms that
explains the properties of such phrases, e.g. keep tabs on, spill the beans, in
terms of general combinatoric restrictions on the individual idiomatic words
(more precisely, the lexemes) that they contain, e.g. keep, tabs, on. Our lexi-
cal approach, taken together with a construction-based grammar, provides a
natural account of the different classes of English idioms and of the idiosyn-
crasies distinguishing among particular phrasal idioms.
1 Introduction
Among the most important desiderata for a theory of idioms, also known as multi-
word expressions (MWEs), are the following four. First, idioms divide into classes
whose distinct properties, described below, need to be theoretically accommo-
dated. Second, the theory should get the semantics right. It should, for example,
represent the fact that (1a) and (1b) have roughly the same meaning, which can be
glossed as (1c):1
0
We would like to thank Frank Richter, Manfred Sailer, and Thomas Wasow for discussion of
some of the issues raised here, Stephen Wechsler for comments on an earlier draft, and especially
Stefan Müller for detailed comments and extensive discussion. We are also indebted for highly
pertinent comments to two anonymous referees for the Journal of Linguistics. None are to be
blamed for the good advice we have failed to accept.
1
All unstarred examples in this paper were attested on the web at the time of drafting, unless
otherwise noted.
1
(1) a. let the cat out of the bag
b. spill the beans
c. reveal a/the secret(s)
Third, the theory should get the syntax right. For example, it should predict that
(2a) is utterly unacceptable while (2b) is not, despite the fact that (3a) and (3b)
mean roughly the same thing.2 We know that idiomatic ghost is meaningful inde-
pendently of the fact that it passivizes because, unlike idiomatic bucket, it can be
semantically modified, as illustrated in (4).
(4) a. Gonna be out of video producing for a lil bit,my camera gave up the
recording ghost yesterday...
b. I took a single year of classes before my parents gave up the budgetary
ghost...
c. Who now remembers the general relief when the increasingly odd Blair
gave up the political ghost?
Finally, the theory of idioms should be part and parcel of a general theory of
grammar; the theory of idioms should add as little machinery to one’s overall
grammatical approach as possible, ideally nothing. We will expand on all these
desiderata using the framework of Sign-Based Construction Grammar (SBCG),
as developed in Boas and Sag (2012) (see in particular Sag 2012). Our theory
of idioms will be lexicalist; in particular, we will divide the lexicon into words
2
It is difficult to find two idioms with similar extensions only one of which passivizes. For
readers who find (2b) unacceptable, a quick Google search will reveal that examples of the passive
form of this idiom are not hard to find. Most examples appear to be metaphorical, about abandon-
ment of projects or activities rather than actual deaths, but there are some examples about actual
loss of life as well, e.g.
(i) Then, in January of ’72, the ghost was given up. It’s still difficult to write about Patchen’s death.
(ii) Gazing into that jar. . . the poison as pure as Alpine snow. . . he had witnessed. . . the awful instant
when the ghost was given up, the wings settling down into the first magnificence of death.
2
that appear only in idioms, such as cat, bag, spill and beans in (1) and the words
that are pronounced exactly like these that appear in nonidiom contexts such as
in (5) (cf. Nunberg et al. 1994 [henceforth NSW] and also Wasow et al. 1984).
O’Grady (1998), in a dependency-based approach, introduced the concept of a
chain of lexical dependencies – christened a ”catena” by Osborne (2012) – as the
underlying structure of idioms. The basic intuition is similar to that of the current
approach, although many details of the implementation are different.
(5) The vet held onto the dog and let the cat out of the bag. (invented example)
(6) The Pam Anderson beans were spilled long ago, but who’s joining the Bay-
watch babe in the Dancing With the Stars spotlight this year?
(7) *John Dean refused to keep the Watergate beans.
Several dimensions of difference among idioms have been observed in the lit-
erature (for reviews, see Fillmore et al. 1988 and Sag et al. 2002 and the references
cited therein). The dimension we will focus on primarily is the general correlation
3
between the syntactic plasticity of an idiom and its semantic compositionality.3 In
footnote 2 above, we mentioned that kick of kick the bucket does not passivize,
whereas give of give up the ghost shows some signs of passivizing. We noted
above that these expressions convey roughly the same meaning but, crucially, in
the kick the bucket case the meaning is presumably a simple predicate ‘die’, while
in the give up the ghost case it appears to be a more complex, decomposable
predicate like ‘lose (one’s) life/soul’. NSW’s hypothesis, which we pursue fur-
ther here, is that meaningfulness of the words of an idiom correlates with their
syntactic potential; specifically, meaningful idiom words can be modified and can
appear in syntactic contexts that meaningless ones cannot. Noting observations
by Ackerman & Webelhuth and by Shenk regarding limited syntactic flexibility in
semantically non-decomposable idioms in German and Dutch, NSW observe that
the structures in these languages calling for special devices to “scramble” con-
stituents – such as those proposed by Reape and Kathol (later published as Reape
1994 and Kathol 2000) – and which do not entail interpretive consequences (as
does, for example, English topicalization) are precisely the syntactic contexts that
permit meaningless idiom words to be “displaced.”
Following Sag et al. (2002), we distinguish three types of (English) idioms:
Fixed Expressions: Expressions which appear to contain more than one word
but which display idiosyncratic syntactic combination (and a fortiori no semantic
compositionality). Examples include by and large, right away, first off, all of a
sudden. These can, without loss of generality, be listed as single words in the
lexicon, despite their spelling suggesting a multiword history.
Semi-Fixed Expressions: Idioms which obey normal syntactic constraints but
which are nonetheless quite frozen as well as semantically non-compositional.
Examples are black and white, kith and kin, bright-eyed and bushy-tailed, it takes
one to know one. One is tempted to accord members of this class the syntactic
structures that they appear to exemplify, but since they strongly resist both mor-
phological and syntactic manipulation, parsimony argues for encoding them as
single words. However, since they can be interrupted by epistemic or intensifying
adverbs, they cannot be considered single words:4
3
This correlation was noted originally by NSW; Abeillé 1995 also provides a TAG treatment
of rich data on partial syntactic flexibility of idioms in French.
4
The appearance of ‘epithets’ within these idioms, as in kith and bloody kin, cannot be taken
as evidence of phrasal structure if it is found that they obey the same phonological constraints as
those interrupting uncontroversial words, e.g. out-bloody-rageous, fan-freakin-tastic. We do not
attempt to answer that question here.
4
(8) a. It sure/surely/certainly takes one to know one.
b. So we huddle together as kith and as kin.
c. bright eyed and totally/completely bushy tailed
The class of semi-fixed expressions also includes idioms that are syntactically
frozen and semantically noncompositional but morphologically alternating. Rel-
evant examples are kick/kicks/kicked/kicking the bucket and buy/buys/bought/
buying a pig in a poke. The inflectional potential of kick, for example, shows that
kicked the bucket is not a fixed expression, but rather a verb phrase constructed ac-
cording to the familiar English pattern. After introducing the next and final type,
we return to the analysis of this and other semi-fixed expressions.
Syntactically Flexible Expressions: Idioms with the following properties:
Parade examples are pull strings and spill the beans, which have been analyzed
as containing special idiom words pull ‘manipulate’, strings ‘access’, and spill
‘reveal’, beans ‘secret(s)’. Idioms of this type are widely taken to have the same
syntactic structure as the homophonous nonidiomatic expressions, which in turn
explains why they can be inflected, modified and dislocated syntactically. This is
the type of idiom that we will be mostly concerned with in this paper.
We begin the introduction of the main features of SBCG in a preliminary dis-
cussion of flexible expressions. Then, as noted, we take up an example of a semi-
fixed expression and finally return to some complex flexible expressions. Fixed
expressions are not discussed further.
5
example, Kay and Fillmore (1999), should find nothing very surprising in the tools
and notations introduced in this paper.6
The most important type of feature structure in SBCG is the sign, with its var-
ious subtypes: word, lexeme and phrase (Sag 2012:71). Each sign has a FORM
feature, whose value is a morphological representation of the expression, no-
tated here in standard English orthography. The other features of the sign are
PHON ology, ARG - ST (argument-structure), SYN tax, SEM antics, and CONTEXT .
The value of SYN is a feature structure that specifies a value for features like
CAT egory, VALence, and MRKG (marking). CAT values are feature structures, as-
signed to various word-class types, that specify values for appropriate features,
including Lexical Identity (LID), whose value is a list of frames. 7
The value of the ARG - ST feature is a list of the arguments – syntactic and/or
semantic – of a predicating lexeme. Members of the ARG - ST list reappear in the
list value of the VAL feature, unless extracted or given null realization.8
The VAL feature takes as value a list of signs, corresponding to the elements
that an expression can combine with: the syntactico-semantic arguments of the
predicator in order of decreasing obliqueness, or accessibility (hSubject, Direct
Object, . . . i). Expressions like NPs and clauses have the empty list as their VAL
value, as they are intuitively ‘saturated’; i.e. they already contain (canonically, at
least) all of the predicator’s arguments.
The values of SEM include specifications for the features INDex and FRAMES.
We assume an indefinitely large list of referential indices 1,2,. . . . Non-referring
expressions such as idiomatic bucket receive the specification [SEM [INDEX none]].
The FRAMES feature takes a list of elementary predications 9 as its value. In cer-
tain cases, the FRAMES list of an expression is empty.
Our analysis appeals to a frame-based conception of semantics (Fillmore 1982,
1985, Fillmore and Baker 2010), in which we assume a bifurcation of the universe
of frames into canonical frames (c-frames) and idiom-frames (i-frames). For ex-
ample, i-beans[secret]-fr models the idiomatic sense of beans in spill the beans.
6
For those familiar with the latter but not the former, read the embedded bracketed attribute-
value matrices (AVMs) as embedded boxes. Angled brackets (h. . . i) denote lists.
7
Typically this will be a singleton list, but the list definition is adopted in anticipation of pos-
sible cases in which non-singleton frame lists for listemes or derived lexemes should arise. CAT
specifications ‘percolate’ in headed constructs from the head lexeme to higher projections. See
Sag (2012) for details.
8
This paper is not concerned with extraction or null realization processes, nor with the PHON
or CONTEXT features. Again, see Sag (2012).
9
Roughly the RELS of Minimal Recursion Semantics (MRS) in Copestake et al. (2005).
6
It is an i-frame that can be loosely paraphrased as secret and is distinct from the
nonidiomatic sense of beans, which we represent simply as beans-fr, a c-frame.10
(10) (i) Shah Rukh Khan spilled many beans about his upcoming film Happy
New Year which has huge starcast of Deepika Padukone, Abhishek
Bachchan, Sonu Sood and Boman Irani.
(ii) Instead of throwing a rock, he is throwing all her family secrets and dirty
laundry out to the media. He enjoys it. You can see it in his smirk as he
spills bean after bean of little juicy nasty tidbits about Palin.
(iii) MilSuper...can you spill a bean or two?
Lexemes are identified in our theory in terms of a feature LEXICAL - ID (LID),
whose value identifies each lexeme (and the phrases it projects) by its FRAMES
value.11 In the case of nonidiomatic lexemes and meaningful idiomatic lexemes,
the LID value is the main frame of the lexeme.12
(11) word
word
FORM hbeansi FORM hbeansi
beans-fr i-beans[secret]-fr
SYN NP LID SYN NP LID
1
1
THEME x THEME x
INDEX x INDEX x
SEM SEM
FRAMES h 1 i FRAMES h 1 i
7
with that of the phrase it heads by the Head Feature Principle (Pollard and Sag
1994) or some similar constraint, as indicated in (12):
(12)
FORM hthe, beansi
SYN NP[LID h 2 i ]
INDEX x
* +
SEM i-beans[secret]-fr
FRAMES ... , 2
THEME x
FORM hbeansi
FORM hthei
SYN noun[LID h 2 i ]
DET
#
SYN "
INDEX x
SEM ... SEM
FRAMES h 2 i
The basic mechanism for preventing idiom words from appearing in contexts
where they are not lexically governed by the appropriate idiom predicator involves
the feature VALence. Ordinary, non-idiom predicators are lexically specified as
requiring all members of their VAL list to be nonidiomatic. By contrast, an id-
iomatic predicator like idiomatic spill requires a direct object that is idiomatic,
i.e. one whose LID value (and hence that of its lexical head) contains a particular
i-frame, as in (13b).13
13
We follow the abbreviation conventions of Copestake et al. (2005) in (i) for the correspond-
ing frame descriptions shown in (ii). Here and throughout, UNDGR abbreviates UNDERGOER.
Notice that we use s variables for quantifying over situations, which function as Davidsonian
event arguments, and we omit the feature LABEL except where necessary for label identification:
(i) l: beans(x) l:spill(s,x,y) l:i-beans[secret](x) l:spill[reveal](s,x,y)
spill-fr spill[reveal]-fr
beans-fr LABEL
l i-beans[secret]-fr
LABEL
l
(ii) LABEL l SIT
s LABEL l
SIT
s
THEME x ACTOR x THEME x ACTOR x
UNDGR y UNDGR y
Representations of feature structures (Sag 2012:74ff) are enclosed in boxes; representations of
descriptions of feature structures (including constructions) are not enclosed in boxes.
8
(13) a. trans-v-lxm
hspilli
FORM
D E
VAL NP x [ LID hc-framei] , NPy [ LID hc-framei]
SYN
D E
1 spill(s,x,y)
LID
" #
SEM INDEX s
FRAMES h 1 i
b. trans-v-lxm
hspilli
FORM
D E
VAL NPx [ LID hc-framei] , NPy [ LID hi-beans[secret]i]
SYN
D E
LID 1 spill[reveal](s,x,y)
" #
SEM INDEX s
FRAMES h 1 i
Because these predicators and their complements also pass their frames up to
be part of the meaning of the phrases they license (in accordance with general
principles of semantic composition), the idiomatic sense of the direct object beans
will be part of the phrase’s meaning only when the predicator spill has introduced
the appropriate i-frame. A nonidiomatic object can only cooccur with predicators
that fail to require an idiomatic object.
Note that predicators like idiomatic spill are themselves identified via c-frames,
which allows the phrases they project to occur in nonidiomatic contexts. The
reader should bear in mind throughout that the basic mechanism allowing idiom
words to occur just when governed by the appropriate lexical governor is that
non-idiom predicators are lexically specified to accept only arguments whose LID
value is canonical, as in (13a), which will always lead to a nonidiomatic VP like
(14). And the idiomatic (13b) will project structures like (14), where the semantics
includes the direct object’s i-frame.
9
(14)
hspilled, the, beansi
FORM
" #
CAT [ LID h 1 i]
VP
SYN
VAL h 5 NPx i
INDEX s
* +
1 l4 :spill(s,x,y), exist(s,l1 ,l2 ), l1 :past(s),
SEM FRAMES
l2 :the(y,l3 ,l4 ), 2 l3 :beans(y)
FORM hspilledi
" #
CAT [ LID h 1 i]
FORM hthe, beansi
V
SYN
VAL h 5 , 6 i
SYN NP[LID h 2 i]
6 " #
INDEX s INDEX y
* + SEM
FRAMES hl2 :the(y,l3 ,l4 ), 2i
1 , exist(s,l1 ,l2 ),
SEM
FRAMES
l1 :past(s)
(15)
hspilled, the, beansi
FORM
" #
CAT [ LID h 1 i]
VP
SYN
VAL h NPx i
5
INDEX s
* +
1 l4 :spill[reveal](s,x,y), exist(s,l1 ,l2 ), l1 :past(s),
SEM FRAMES
l2 :the(y,l3 ,l4 ), 2 l3 :i-beans[secret](y)
FORM hspilledi
" #
CAT [ LID h 1 i]
FORM hthe, beansi
V
SYN
VAL h 5 , 6 i
SYN NP[LID h 2 i]
6 " #
INDEX s INDEX y
* + SEM
FRAMES hl2 :the(y,l3 ,l4 ), 2i
1 , exist(s,l1 ,l2 ),
SEM
FRAMES
l1 :past(s)
Our analysis thus treats Kim spilled the beans as ambiguous. (13a) will give rise
10
to an S whose FRAMES list is as shown in (16a), and (13b) will likewise project
an S whose FRAMES list is (16b):
The ls here are just labels (see Copestake et al. 2005) that can be eliminated by
replacing an argument l by the formula labelled by l, as shown in (17):
(17b) is the idiomatic reading, which can be read as ‘there is some past situation
in which there is a contextually provided secret that Kim reveals’.
(18) a.*He was kicking the bucket when she walked in.
b. Erasmus seized him by the throat, threw him down, and giving him a
violent kick in the neck, a rattle in the throat announced that he was
giving up the ghost.
c. Lowe and Cushing sank bleeding to the basket floor as the balloon was
giving up the ghost, its descent accelerating as the weight of its fabric
overpowered the . . .
Example (19) shows that the nominal gerund is possible for kick the bucket, so
if the gerund and present participle are taken to employ the same morphological
object, the progressive cannot be blocked morphologically.
(19) I think the correlation between her kicking the bucket and the surgery is a
daft point.
11
The first interesting question that arises is whether to attribute the meaning
‘die’ to kick alone or to the full VP kick(s/ed/ing) the bucket. We propose to
analyze kick the bucket by positing two lexemes, an idiom-predicator verb kick
meaning ‘die’ (punctually) which requires as object a meaningless noun bucket14
marked by the.
To account for the absence of Passive, one might simply posit a restriction on
the VERBFORM (VF) value of kick. Such an analysis, apart from being stipulative,
does not yet deal with the entire problem raised here, since Passive is not the only
construction where this idiom is blocked. No idiomatic readings are available for
any of the examples in (20), for example:
We propose that Questions, WH-clefts, It-clefts and Middle all impose semantic
or pragmatic constraints on the (displaced) direct object of the verb, which would
make these constructions incompatible with any analysis of idiomatic kicked that
required a semantically vacuous direct object. The relevant fact is not that exple-
tive nouns cannot occur as passive subjects, which in fact they can.
(21) a. There was believed to be another worker at the site besides the neighbors
who witnessed the incident.
b. It was rumored that Great Britain, in apparent violation of the terms of
the Clayton-Bulwer treaty, had taken possession of certain islands in the
Bay of Honduras.
14
A referee, noting the existence of bucket list, raises the question of including all the semantics
of the idiom in idiomatic kick. Motivation for this move comes from the observation that there
exist several variants of the idiom in which bucket is lacking.
(i) The old guy kicked off, eh?
(ii) ... you were morally obligated to eat the flesh of Aunt Rose when she kicked it - regardless
how she died (or what you thought of Aunt Rose).
(iii) I’m old but, before I kick, I’m going to visit Montana where, from what I’m told, there’s
no friggin city lights and just lie on my back and look at the night sky.
12
It is rather that passive does not apply to verbal lexemes that select an expletive
direct object.15
(22) a. *It is just being winged (essentially) by this man. (Cf. attested: Even
this man is essentially just winging it.)
b. *It was totally lost by John and Kristen when one answer was given.!
(Cf. attested: John and Kristen totally lost it when one answer was
given!)
c. *It was blown by Becca when she wore a juicy couture jumpsuit to the
final rose ceremony.(Cf. attested:: Becca blew it when she wore a juicy
couture jumpsuit to the final rose ceremony. )
This strictly lexical analysis of kick the bucket accounts in addition for both the
inflectional freedom of the head kick and the fact that the bucket NP can be inter-
rupted by metalinguistic operators like proverbial, metaphorical, etc., as in (23):
(23) a. I feel like I’ve seen so much of the world already, but there’s still a lot I
want to do before I kick the proverbial bucket. . .
b. My Kindle kicked the metaphorical bucket on the 25th.16
Since idiomatic kick specifies about its NP complement only that it is an NP
headed by the idiom word bucket, the insertion of epithets or semantically external
modifiers in examples like those in (23) causes no problem, while passivization
and semantic modification of the idiomatic word bucket are precluded by that
noun’s lexically specified meaninglessness.17
We now turn to our analysis of the idiomatic common noun bucket in SBCG
terms. This involves a listeme (lexical entry) that is like the one for nonidiomatic
bucket, except that (1) the FRAMES list is empty, (2) the INDEX value is none and
(3) the frame contained in the LID value is of type i-bucket[null]-fr:
15
We are indebted to an anonymous referee for the observation that expletives selected by verbal
listemes are blocked from becoming passive subjects. The Passive Construction is a derivational
construction containing a mother lexeme and a single daughter lexeme. The constraint on pas-
sivizing expletive objects is implemented by assigning a value distinct from none to the semantic
feature INDEX of the daughter verbal lexeme in the Passive Construction.
16
As a referee points out, kick the bucket can also be interrupted by domain-delimiting adjectives
(Ernst 1981), e.g., In a metaphorical sense, Ingraham has kicked the political bucket. We take such
cases to represent, not modification of bucket, but exemplification of the external modification
trope whose parade example is P.G. Wodehouse’s pensive cigarette.
17
The metalinguistic modifier can nevertheless select the idiomatic word bucket by selecting for
its LID value, which indicates a subtype of i-bucket[null]-fr.
13
(24) cn-lxm
FORM hbucketi
" #
CAT [ LID hi-bucket[null]-fri]
SYN
VAL hi
" #
SEM IND none
FRAMES hi
The label cn-lxm indicates that idiomatic bucket is a common noun lexeme, a sub-
type of lexeme. The first syntactic specification requires that the LID feature have
the value hi-bucket-fri, the list containing just a feature structure of type i-bucket-
fr. As indicated above, the type hierarchy recognizes two kinds of frames: canon-
ical frames (subtypes of c-frame) and idiomatic frames (subtypes of i-frame). In
the case of meaningless idiomatic lexemes, those with an empty FRAMES list and
none as their INDEX value, like idiomatic bucket, the LID value is a semantically
inert i-frame that simply identifies the lexeme for purposes of selection by the
relevant predicator. As before, since CAT features percolate in headed constructs
from the head lexeme to higher projections, the identity of idiomatic bucket as the
head of a noun phrase like the bucket or the proverbial bucket will be visible to
idiomatic kick.
Before turning our attention to idiomatic kick, let us consider the construction
that licenses the NP headed by idiomatic bucket. The Head-Functor Construction,
formulated in (25), licenses constructs of type hd-func-cxt, a subtype of headed-
construct; a headed-construct specifies one of its daughters as the head daughter
and further specifies that the head daughter’s syntactic category (SYN|CAT) value
is shared by the mother, encoding the principle of head feature percolation.18
14
In our discussion of spill the beans, we have already sketched a construct licensed
by this construction; see (12) above.
We notice in the title line of (25) the notation ↑headed-cxt, which is to be
read as ‘headed-construct is an immediate supertype of head-functor-construct’.
Constructs play an important role in SBCG: they are like local trees licensed in
a Context-Free Grammar (CFG); similarly, the combinatoric constructions that
license these constructs are analogous to context-free rewrite rules. Since a con-
struct is modeled as a feature structure, it too is specified in AVM notation: its MTR
(MOTHER) feature’s value is a sign and its DTRS (DAUGHTERS) feature’s value is
a non-empty list of signs.
Just as ‘VP → VP ADV’ places constraints on a type of head-modifier struc-
tures in a CFG, the Head-Functor Construction constrains a class of headed struc-
tures in a SBCG. Note that the constraints in both cases are local, making refer-
ence only to two levels of structure. The notation for SBCG constructions presents
a double arrow with the type name of a construct on its left, here head-functor-
construct (hd-func-cxt), and on its right an AVM expressing the defining properties
of the class of constructs named on the left. A sign is thus either licensed by lex-
ical entries, listemes, or else it is ‘constructionally licensed’ as the mother of a
well-formed construct.19
The defining constraint on the class of hd-func-cxts says something about the
mother, the daughters and the head-daughter. The notation in the MTR’s value
says that the SYN value is the same as that associated elsewhere with the variable
X, except that the local value has the marking value indicated by the variable
M . (The latter, ‘except’, clause is the interpretation of the ‘!’ following the X;
cf. Sag 2012:125-6, fn. 71.) The DTRS value specifies a list of two daughters.
Taking the second daughter first, its SYN value is tagged X, which, when taken
together with the occurrence of ‘X!’ in the MTR’s SYN value, says that the MTR’s
syntax value is that of its second daughter except for the specification [MRKG M ].
The second daughter is tagged Y , which identifies it as the head daughter (i.e.
the value of the HDDTR feature just below). Turning now to the first daughter
(the functor), we note that its MRKG value is M , and its SYN|CAT|SELECT value
is Y , the variable also assigned to the head daughter. The M specifies that in
constructs of type hd-func-cxt the mother inherits its MARKING value from the
non-head daughter (while it inherits the rest of its syntax from the head daughter,
19
There are also unilevel constructions in the lexicon, called lexical class constructions, which
constrain classes of lexemes; see Sag (2012):15. Lexical class constructions are exemplified below
in (45) and (53).
15
as we have seen). The Y indicates that the ‘functor’ daughter selects the head
daughter. In the case of idiomatic kick the bucket, this specification will allow
the to select a nominal phrase headed by idiomatic bucket. We assume that En-
glish has a meaningless the, which occurs in this and other idioms, such as chew
the fat ‘converse’, drop the ball ‘err’, bite the dust ‘die’, fly the coop ‘escape’,
buy the farm, ‘die’, and shoot the breeze ‘chat’. The listeme for meaningless the
specifies [FRAMES hi-the[null]-fri] and correspondingly [LID hi-the[null]-fri]. It
also specifies the unique marking value hi-the[null]-fri (see Section 5). The pred-
icators of this group of idioms take NP arguments with the LID of the NP’s head
noun and the MRKG value hi-the[null]-fri. For example, idiomatic kick is specified
[ARG - ST hNP[LID i-bucket[null]-fri, MRKG hi-the[null]-fri]. The Head-Functor
Construction licenses the combination of non-quantifier the and idiomatic bucket
(or a bucket-headed phrase with appropriate modification) into an NP obeying the
constraints that idiomatic kick will require of its pseudo-object.
Before considering the special lexical entry for idiomatic kick we consider
the construction that licenses phrases consisting of predicators with their non-
subject complements, i.e. the construction that will enable kick to combine with
the. . . bucket. The Predicational Head-Complement Construction licenses head-
complement phrases—VPs, PPs, APs, and Common Noun Phrases (CNPs)—in
which all non-subject complement requirements (non-subject valents) of a pred-
icator are realized as sisters to the head. As in the Head-Functor-Construction,
we note in the title line of (26) that this construction also licenses a subclass of
headed constructs:
(26) Predicational Head-Complement Construction (↑headed-cxt):
MTR [SYN X ! [VAL hY i]]
DTRS
h Z i ⊕ L:nelist
pred-hd-comp-cxt ⇒ word
" #
Z: [XARG Y ]
HDDTR
SYN X : CAT
VAL hY i ⊕ L
Our earlier discussion exhibited two constructs, (14) and (15), licensed by (26).
As in the Head-Functor Construction (25), MTR, DTRS, and HDDTR features
are all constrained by (26). Looking first at the DTRS value, we note that it is a
composite list, composed of a singleton list containing a sign tagged Z, which
we see below is identified with the head daughter, followed by a non-empty list,
tagged L. Viewed as a local tree this class of constructs is characterized by a list
16
of one or more daughters of which the head daughter is the first. Looking down
now at the head daughter, tagged Z, we see that it is of type word and that its SYN
value, tagged X, has specifications for CAT and VAL features. The CAT feature
specifies that its external argument (XARG) is tagged Y —which we see again
in the MTR’s SYN|VAL list. In general, the XARG feature, as a CAT feature, is
projected upwards from the lexical head in all headed constructs, making visible
at all ‘bar’ levels the subject requirement of a lexical predicator. Normally, the
XARG value is identified with the first element of the VAL list, which lists the
unsaturated argument requirements of a predicator.20 The VAL list of the head
daughter consists of a singleton list containing the XARG (subject-to-be), tagged
Y , which is identified with the unique member of the VAL list of the mother,
followed by a non-empty list tagged L, whose members, we see in the DTRS value,
are identified with the non-head daughters. The Predicational Head-Complement
Construction will license a VP headed by idiomatic kick followed by an NP headed
by idiomatic bucket.
We have seen how the Head-Functor Construction licenses an NP headed by
idiomatic bucket, selected by the, and possibly containing semantically external
modification: e.g. proverbial or metaphorical. We have also discussed the con-
struction that assembles verb phrases, among others, and their non-subject com-
plements. It remains to complete the picture by considering the listeme for id-
iomatic kick, which is given in (27):
(27)
strans-v-lxm
FORM hkicki
" #
CAT | LID hi-bucket-fri
SYN
VAL NPi , NPSYN
MRKG the
*" #+
die-punctually-fr
SEM FRAMES
DECEDENT i
17
contributes to the sentence’s meaning. The semantics of the lexeme kick consists
of the predicate ‘die punctually’ whose argument is identified with the index of its
potential subject. The second valent requirement will ensure that this verb projects
a VP containing a single NP, whose head lexeme is idiomatic bucket and whose
functor (‘specifier’) is meaningless the. The resulting VP will express a predi-
cate ‘die punctually’ that is ‘looking for’ a subject NP to provide the DECEDENT
argument to complete the predication.
This concludes our analysis of kick the bucket. Notice that the present ap-
proach eliminates the need for special devices for the analysis of semi-fixed ex-
pressions. Instead, they are analyzed in the same way as syntactically flexible
expressions, leaving their restricted properties to be explained in terms of the id-
iosyncrasies of their listemes in interaction with ordinary phrasal constructions.
4 Super-Flexible Idioms
The idioms we have examined so far exhibit varying degrees of flexibility. In this
section we explore the analysis of idioms whose variability includes more than
multiple morphological realizations or the positional variation of a single idiom
word. We first examine side the bread is buttered on and What’s X doing Y?, which
interact with wh-constructions in interesting ways, yet exhibit certain differences
which, we claim, can be simply treated in terms of diverse lexical constraints and
their interaction with the constraints imposed by English combinatorial construc-
tions and other listemes.
Consider first the sentences in (28):
(28) a. Yes, a little line drawn in the sand to let Bibi know which side his bread
is buttered on.
b. I know which side my bread is buttered on.
c. Google should know which side its bread is buttered on.
d. And don’t think for a minute that the “scientific community”, especially
the Al Gore global warming community, doesn’t know what side their
bread is buttered on.
We have not found any linguistic analyses of this idiom. Popular accounts usually
say—or more often imply—that it has the following properties:
18
B. The specifier of bread is a genitive pronoun.
C. The pronominal specifier of bread is bound to the subject of know.
D. The NP headed by side must cooccur with an interrogative wh-word deter-
miner.
E. The bread-is-buttered-on phrase is an embedded interrogative clause.
F. The immediate governor of the phrase headed by buttered is be, presumably
the passive auxiliary.
G. The preposition on is stranded.
The next section may be viewed as a methodological exercise. The idiom is
more abstract than suggested by the above picture. In fact none of the proper-
ties (A–G) is a necessary one. Often, when one notices a recurring grammatical
pattern and begins to investigate its constructional status, one finds that the truly
idiomatic part—the irreducible stuff that is unpredictable from anything else in the
grammar—is more abstract than the pattern one originally recognized. Following
this reduction of the scope of the syntactic investigation, we will look at the se-
mantics of the idiom. Then a somewhat more formal analysis of the construction
will be presented, in an effort to persuade the reader of the analytic versatility of
Sign-Based Construction Grammar.
19
B. The specifier of bread need not be a pronoun. It may be a genitive noun
phrase, either proper or common, or it may be the definite article the. The con-
straint seems to be only that it must be definite:
(30) a. In his pursuit to find out which side Benji’s bread is buttered on, Andy
J,. . .
b. Why is this drivel on Massey’s site? Makes one wonder who’s side
WSAZ’s bread is buttered on!
c. I chose the super closeup to convey the fact that we don’t know which
side the guy’s bread is buttered on. Eg. Goodie or baddie.
d. pictures of Arafat as if he was a hero when even Arafat knew which side
the bread was buttered on.22
22
More examples of this type for those inclined to reject these, again selected from many others
attested on the web:
(i) If this is not proof as to which side the MSM’S bread is buttered, then I don’t know what is!
(ii) It is now clear, though, on which side the Global Times’s bread is buttered.
(iii) LOL I think we know which side the umpire’s bread was buttered!
(iv) a little checking showed which side the military’s bread was buttered on and the side they’d
...
(v) Maria knows on which side the family’s bread is buttered, and realizes that without the
income generated by Vermeer’s oils, the household would be thrown into the street.
(vi) Cameron knows which side the UK’s bread is buttered on.
20
c. I know which side Haseo’s bread is buttered on.
d. He had the manners and listening attitude down pat—but you know
which side his bread is buttered on.
e. I’m pretty sure that we know which side Michelle’s bread is buttered on.
(32) a. Gmail allows you to use an email client, but they restrict some of the
goodies to web access, because that is the side their bread is buttered
on.
b. but if these guys step out of line I’m sure the side their bread is buttered
on will become apparent.
E. The pattern may occur without all or any part of it constituting an embedded
question—(32a) and (33), for example:
F. A verbal governor of the past participle buttered need not be the passive aux-
iliary be. The get-passive is also possible, although admittedly rare:
(34) a. Rep. Bachmann said. “They are incredulous about the possibility of
losing their majority and they know which side their bread gets buttered
on and ACORN is their friend.”
b. Beck knows which side his bread gets buttered on.
c. This story still uses liberal connotations and loaded terms, though, so
you can tell which side their bread gets buttered on. (punctuation added)
23
See also footnote 21.
21
d. TRUE - Stafford will be back at QB and Detroit has made some good
moves this off season. They know what side their bread gets buttered
on.
e. It’s no surprise then that they would sugar coat or down play anything
which makes their benefactors look bad, they know what side their bread
gets buttered on.24
G. Perhaps surprisingly, Pied Piping tokens of this idiom are about as frequent
on the web as preposition-stranding ones. In actual Google hits on ∗ side ∗ bread ∗
buttered scores 480 versus 490 hits for side ∗ bread ∗ buttered on. A few examples
of Pied Piping:
(35) a. They know on which side their bread is buttered and will adapt quite
quickly to what they see as the potential winning side.
b. Wendy knows on which side her bread is buttered.
c. Still, few Costa Ricans have anything bad to say about their country’s
popularity as a destination—perhaps simply because they know on which
side their bread’s buttered.
d. Salespeople know on which side their bread’s buttered.
The contraction of bread’s in the last two examples suggests that the Pied Piping
version of the idiom is not restricted to formal contexts.
We also find examples with the stigmatized, preposition-doubled form, as well
as those with no preposition.
(i) Sometimes it needs outsiders such as these Spaniards to show them Yanks
on which side the bread is buttered on.
(ii) Knowing on which side his bread was buttered on, he flew the flag for the
conservative side of politics.
24
The use of get, rather than be, in this idiom is relatively rare. Google yields only 50 true hits
for their bread gets buttered on versus 498 for their bread is buttered on. We have not attempted
to compare this ratio of roughly 1/10 to the relative frequencies of the get- and be-passives. A case
could be made to count examples like those in (34) as word play or other forms of nonce extension
of the grammar, thus excluding them from the data on which analysis of the construction is based.
Judgment calls of this kind are seemingly unavoidable in grammatical research, even when based
on corpus data.
22
(iii) Pastors, knowing on which side their bread was buttered on, gave more to
those that paid more and gave little to those that contributed little.
(36) a. There will be a fine fracas soon, and I must see, whatever happens, that
my bread is well buttered.
b. There, as his declassified 600-page FBI file shows, his bread would be
buttered by Harry Bridges’ Communist- controlled International Long-
shore and Warehouse Union.
c. I fear that those people are basically keeping their bread well buttered by
fomenting the threat of terrorism, which serves their interest as it does
Beijing’s.
d. But it seems that the company’s bread will be buttered by users looking
to get around location restrictions.
e. Think about how Ike’s bread gets buttered at work. What is he rewarded
for?
f. From a business perspective, is your bread best buttered at your current
job or elsewhere?
g. Politicians everywhere seem to understand their bread is buttered with
political donations, even if those donors are in Washington, or Miami,
or Dallas, and that politician resides in Guatemala City.
h. OK, so he’s a screenwriter, right. Actually, no. His bread is really but-
tered with voice work. He’s been doing English dubbing for Japanese
anime since he was ten years old...
25
We are indebted to an anonymous referee for this observation.
26
We cite numerous examples to demonstrate the robustness of the phenomenon.
23
i. The most popular actors in American cinema know how their bread is
buttered and embrace the public.
j. It’s hard to blame a politician for knowing how the bread is buttered
around here.
k. Teixeira knows where his bread is buttered.
l. Suffice it to say, Mrs. Clinton knows where her bread is buttered.
m. Of course, there would be some countries who would resist, having had
their bread royally buttered by Blatter, but FIFA couldn’t survive the
resignations of Germany, Italy, Spain, England, France, USA, etc.
n. Despite having had their bread well-buttered by a series of British monar-
chs the Sultans became enamoured of German power and Prussian state
organization
24
(37)
buttered[satisfy]-fr
LOCATION y
FRAMES
x
THEME
25
word
h buttered i
FORM
* " # +
LID hi-bread[needs]-fri
ARG - ST X:NPy , PPx [ LID hi-side[loc]-fri]
MRKG def
verb
CAT h 1 i
LID
VFORM pas
SYN
VAL hX,. . . i
GAP hNomPx i
" #
SEM IND s
FRAMES h 1 buttered[satisfy](s,x,y) i
(38) trans-p-word
honi
FORM
SYN
[CAT [LID hZi]]
D E
ARG - ST NPy [LID hZ:i-side[loc]-fri]
" #
INDEX y
SEM
FRAMES hi
The listemes for idiomatic bread and side are given in (39) and (40). The notation
cn-word abbreviates common-noun-word.
(39) cn-word
FORM hbreadi
SYN
[CAT [LID hZi]]
" #
SEM INDEX y
FRAMES hZ:i-bread[needs](y)i
26
(40) cn-lxm
FORM hsidei
SYN
[CAT [LID hZi]]
" #
SEM INDEX x
FRAMES h Z:i-side[loc](x) i
27
the bare preposition on will be stranded and so appear on the VAL list. In the
version of the idiom in which there is no PP[on] (and a fortiori no side), there will
be no PPx entry on the ARG - ST list and the value of GAP will be the empty list.
Let us illustrate a clause built by the interaction of the listemes constituting
this idiom with the relevant phrase-building constructions of English:
(41) [He finally figured out] what side the bread is buttered on.
phrase
FORM hbuttered, oni
" #
LID h 2i
CAT
pas
VF
SYN VP
VAL h 3 NPy i
GAP h 1 NPx [ LID hi-side[loc](x)i]i
" #
SEM IND s
FRAMES h 2 l1 :buttered[satisfy](s,x,y) i
word
hbutteredi
FORM
" #
LID h 2 i
word
CAT
FORM honi
VF pas
V
SYN
H:
VAL L h 3 , 4 i
4 SYN
P[GAP h 1 i]
" #
GAP h 1 i x
SEM IND
ARG - ST L
FRAMES hi
" #
SEM IND s
FRAMES h 2 i
In Figure 2 we see that the LID of the passive participle buttered is identified with
28
the content of the FRAMES value, via the tag 2 .29 And, since LID is a CAT feature,
it reappears as the corresponding feature of the mother phrase buttered on. The
VALENCE list of buttered is identified, via the tag L , with its ARG - ST value, since
there is no extraction or null argument instantiation for buttered. The VAL list
contains the tags 3 and 4 , the second of which tags the on constituent, which is
the right sister to buttered. Since the first valent, 3 , is not realized as a sister,
it reappears in the VAL list of the mother, buttered on (uncancelled valents are
passed up to an expression’s mother, as in Categorial Grammar). Although there
is no extraction or null instantiation of arguments of buttered, there is nonetheless
a non-empty GAP list, since there is extraction of the object of the complement
sister on.30 Since a mother phrase inherits the gaps of its daughters (unless it is
the mother of a filler-gap construct), the GAP value of the mother is identified
with that of the lexical head. As a result, the singleton member of the GAP list
of on is indirectly passed to the mother buttered on and will require the filler NP
higher in the tree to be side-headed. On’s index, x, appears as the LOCATION of
the buttered[satisfy]-fr in the FRAMES value of the mother (and head daughter).
The Predicational Head-Complement Construction combines idiomatic but-
tered and the idiomatic preposition on to license the gap-containing, non-finite,
passive VP buttered on. This same construction combines the finite passive aux-
iliary is with buttered on to license the gap-containing, finite, passive VP is but-
tered on. The Subject-Predicate construction combines the NP the bread with the
gap-containing VP is buttered on to form the gap-containing clause the bread is
buttered on. This is summarized in the analysis tree shown in Figure 3. Finally,
the Non-Subject Wh-Interrogative Construction (Sag 2012:167) allows the filler
NP what side to combine with the gapped clause to license the non-gapped clause
what side the bread is buttered on, as shown in Figure 4.
29
In the following discussion the reader familiar with neither SBCG nor HPSG may wish to
consult Sag (2010a) for technical details.
30
This follows the analysis of Sag (2010b), where each predicate ‘amalgamates’ the GAP values
of its arguments into its own GAP value. For an alternative compatible analysis, see Chaves (2012).
29
FORM hthe, bread, is, buttered, oni
SYN S[GAP h 1 NPy [LID hi-side[loc](x)i]i]
* +
4 l3 :the(y, l2 , l1 ), 5 l2 :i-bread[needs](y),
SEM FRAMES
2 l1 :buttered[satisfy](s, x, y)
FORM hthe, breadi FORM his,
buttered, oni
" #
SYN NP SYN VP GAP h 1 i
H:
3 " #
VAL h 3 NPy i
IND y
SEM h i
FRAMES h 4 , 5 i SEM FRAMES h 2 i
FORM hbuttered, oni
" #
FORM hisi LID h 2 i
" # CAT
GAP h 1 i VF pas
H:SYN V 6 SYN VP
VAL h 3 , 6 i VAL h 3 i
h i
SEM FRAMES hi
GAP h 1 i
h i
SEM FRAMES h 2 i
30
FORM hwhat, side, the, bread, is, buttered, oni
S[GAP hi]
SYN
* +
8 what(x,l7 ,l3 ), 7 lx :i-side[loc](s), 4 l3 :the(y,l2 ,l1 ),
SEM FRAMES
5 l2 :i-bread[needs](y), 2 l1 :buttered[satisfy](s,x,y)
FORM hwhat, sidei
FORM hthe, bread, is, buttered, oni
SYN NP[LID h 7 i]
SYN S[GAP h 1 i]
1
H:
x
h i
IND
h 4 , 5 , 2 i
SEM SEM FRAMES
FRAMES h 8 , 7 i
Figure 4: The NonSubject Wh-Interrogative Clause: what side the bread is but-
tered on
The semantic fact of particular importance in Kay and Fillmore’s discussion is the
interpretation paraphrasable in terms of why, how come or what is the reason that,
as indicated in (42).
The essential ingredients of WXDY are the following:
31
b. a form of the copula governing doing
c. a gap associated with the object of the progressive participle of the verb
do
d. a predicative XP following doing, forming a constituent with it
e. the impossibilty of negation, either of be or of do
f. a causal interrogative semantics
g. a pragmatic attribution of incongruity of the proposition whose cause is
being questioned.
(44) a. I wonder what the salesman will say this house is doing without a kitchen.
(invented example, Kay and Fillmore 1999, ex. (3)c)
b.*What does your name keep doing in my book?
c.*What will your name (be) do in my book?
d. What is he doing? (lacks WXDY semantics)
e.*What weren’t you doing (not) talking to my aunt?
f.#What is he doing drunk, which everyone knew he would be?
Example (44a) is of particular importance in showing that the scope of the causal
operator is not necessarily the same as the clause following what. That is, though
the position of what demarcates the top of the interrogative clause, it is the em-
bedded structure this house is doing without a kitchen whose causality is to be
explained by the salesman. (44a) does not mean ‘I wonder why it is that the sales-
man will say that this house lacks a kitchen’.
WXDY finds a simple analysis within SBCG. This analysis, like the previous
ones, is purely lexical in nature. First, in order to account for the role of be in
WXDY, we posit a listeme like the following:
(45) copula-lxm
* " #+
VP
ARG - ST X,
LID hi-doing-fri
This listeme, which inherits all of its remaining properties from the Copula Con-
struction (a lexical class construction), selects a subject (X) and a VP complement
32
whose LID is the idiomatic i-doing-frame. Like other copula be-lexemes, this is
an auxiliary verb with subject-raising properties. And because its FRAMES list is
empty, it makes no contribution to the semantics.
The lexicon contains only one listeme whose LID is i-doing-fr, and hence only
one lexeme that gives rise to words that can head the VP complement of the be in
(45). This listeme, because it includes the specification [VF prp], will have only
one kind of word realization—a present participle as sketched in (46):
(46)
word
FORM
hdoingi
* " #+
ARG - ST VAL h1i
1 , 2 , 3 XP
LTOP l
verb
hi-doing-fri
CAT
LID
prp
SYN VF
VAL h 1 , 3 i
D E
GAP h 2 NPx [ LID what-fr ]
x
INDEX
SEM justification-fr
x
FRAMES EXPLICANS
EXPLICANDUM l
The ARG - ST list of the verb in (46) contains three elements: a subject ( 1 ), a
direct object ( 2 ), and a predicational phrase ( 3 ). The predicational phrase, XP,
has a unique (subject) valent, which is identified with the subject valent of doing.
The indication [LTOP l] encodes the identification of the principal frame 31 of the
semantics of the predicational phrase with the EXPLICANDUM argument of doing.
The direct object is absent from the VAL list and present on the verb’s GAP list.
This element is specified as [INDEX x], which identifies it with the index of the
verb’s semantics, and [LID hwhat-fri], assuring that the filler daughter will also be
31
In a FRAMES list, the members form a virtual, singly-rooted tree via use of the LABEL feature
to allow all but one frame of the list to be identified with the value of an attribute of one other
member. See Sag (2012:6ff), following Copestake et al. (2005).
33
so specified. The relevant properties of the word what are specified in (45). This is
the ordinary interrogative noun what, as indicated by its non-null WH value. The
empty-set value of the REL feature indicates that what is not a relative word, such
as, for example, who in the knave who stole the tarts. In the SEM value the what-fr
shows what to be an interrogative quantifier. Its BOUND VARIABLE x is identified
with the unique argument of its RESTRICTION, the thing-fr, as indicated by the
tags l1 . (The notations label for the features LABEL and SCOPE of the what-fr
indicate that these features take an unspecified value of type label.)
word
D E
FORM
what
noun
D E
1 what-fr
CAT LID
none
SYN SELECT
WH 1
no
(47) REL
INDEX x
what-fr
label thing-fr
SEM
LABEL
FRAMES BV x , LABEL l1
REST l1 INST
x
label
SCOPE
34
(48) a. What is your name doing in my book? (=42b)
b. What is the justification of your name being in my book?
In this idiom it is possible to insert an optional modifier between wrong and tree,
as shown in (51). The possibility of such modifiers forecloses the possibility of
treating the whole idiom as a ‘word with spaces’.
35
FORM hwhat, is, Bo, doing, herei
VF fin
CAT IC +
SYN S
INV +
GAP hi
LTOP l0
what-fr
wh-int-ns-cl
LABEL l0 thing-fr
1 BV
x , 2LABEL l1
,
REST l1 x
INST
SEM
SCOPE l2
FRAMES
justification-fr location-fr
l2 LABEL l3
LABEL
, 4
3
EXPLICANS x LOCATUM Bo
EXPLICANDUM l3 LOCATION here
his, Bo, doing, herei
FORM
FORM what
VF fin
" #
WH 1
SYN NP
CAT IC +
REL {}
SYN S
INV +
IND x
GAP hNPx i
SEM D E
FRAMES 1,2 h i
SEM FRAMES h 3 , 4 i
36
b. Barking up the wrong evidence tree. . .
c. Could we have been barking up the wrong linguistic tree all these years
by over-emphasizing the importance of complexity and accuracy?
d. It isn’t always possible to avoid wrong turns, but these steps may help
keep you from barking up the wrong family tree.
(52) a. The Sox are singing a different, quieter tune ahead of spring training.
b. However, after spotting a few emerald green dresses underneath the
mistletoe last year, I may be singing a different (Christmas) tune. . .
c. . . . today’s deal from Light Touch Aesthetics for a restorative body wrap
will have you singing a different holiday tune.
d. To most Americans, this school marm brandishing a switch seemed to
be singing a different, even principled, tune.
The problem that facts like these raise for approaches like the HPSG analysis of
Pollard and Sag (1994), or the SBCG treatment sketched in Sag (2012), is that the
Head-Functor analysis of noun phrases provides no way for a common noun to
require a modifier that stands in the functor relation to it. The main predicator,
e.g. idiomatic bark, can require a PP complement headed by idiomatic up, and
idiomatic up can require an object headed by idiomatic tree, but idiomatic tree
cannot specify that it is modified by idiomatic wrong. Indeed, as Richter and
Sailer (2009) have shown, this problem of nonlocal dependencies in idiomatic
expressions is widespread, crossing clause boundaries in the case of many German
idioms that they analyze. In the next section, we show how the same tools we
apply here for treating nonlocal idiomatic dependencies in English can provide an
account of the German data they discuss.
This problem relates to an issue that we mentioned in passing in section 3,
where we proposed the listeme (24) (repeated here) to analyze the semantically
empty lexeme bucket of kick the bucket:
37
(24) cn-lxm
FORM hbucketi
" #
CAT [ LID hi-bucket-fri]
SYN
VAL hi
" #
SEM IND none
FRAMES hi
This listeme shows an empty FRAMES list and an LID value of i-bucket-frame,
unlike the other listemes discussed by Sag (2012), where the FRAMES list and LID
value are identified. This raises the question of how to state the general principle
and to allow for its exceptions. Given our system of monotonic constraints, this
boils down to the question of what class of expressions (i.e. what lexical type) is
subject to the identity constraint and what exceptions need to be stated.
The solution to both the problems of modifiers in idioms and obligatory mod-
ification in general requires a slight revision in the typology of lexemes. It is
convenient to begin by considering the second problem mentioned, obligatory
modification. The MRKG feature has until now functioned much like the LID
feature, except that the latter is clearly ‘passed up’ from head daughter to mother,
while MRKG passes up from the non-head daughter to its mother in a head-functor
construct. Both features are conceived, however, as providing a partial semantic
characterization of the expression they mark. Since the hierarchy representing the
range of the LID feature provides a comprehensive semantic taxonomy of lexemes,
there is neither formal nor empirical motivation for positing a distinct range of val-
ues for the MRKG feature. Any non-maximal classification of a lexeme, such as
degree word, deg, or definite, def, will appear at some level in the frame hierarchy
degree-fr or definite-fr; any maximal classification of a lexeme, such as i-wrong-
fr, will of course also appear there. We simplify the formalism by eliminating a
special set of values for the MRKG feature and identifying the MRKG value and
LID value for all lexemes.33
Thus identifying the MRKG and LID values provides a solution to the obliga-
tory modification problem. Under the proposed change a predicator or functor can
specify in its respective ARG - ST or SELECT value a complement or head with a
33
This identification holds only for lexemes. We will see how the fact that phrases can, and
often do, have different LID and MRKG values provides a strategic advantage. Just as there are se-
mantically empty frames that serve to identify (projections of) semantically null idiom words like
idiomatic bucket, so there are semantically empty frames that serve to identify certain “marker”
words, such as the complementizer that.
38
specified MRKG value. This MRKG value of a head can, in the relevant cases, only
be acquired from combination with a functor sister bearing that MRKG value in a
head-functor construct, and the MRKG value can identify that modifier uniquely
if desired. A predicator or potential functor (mainly determiner or modifier) can
specify, as loosely or tightly as required, a modifier of the head of the phrase it
subcategorizes for or selects. For example, the idiomatic the of up the wrong tree
can specify that the idiom word tree it selects has a MRKG value that guarantees
that tree is modified by wrong. Similarly, the idiomatic up can subcategorize, not
only for a NP whose LID value calls for idiomatic tree to be its head, but also
that the MRKG value of the phrase it selects indicate the idiomatic the which in
turn, as we just saw, ensures that the tree-headed nominal the selects contains the
idiomatic modifier wrong.34
We first consider how the lexemes for the, wrong, and tree are combined into
the phrase the wrong tree.35 The terminal nodes in the analysis tree in Figure 6
display the relevant information about the lexemes the, wrong, and tree.36 The
SELECT value of wrong calls for a selectee whose LID and MRKG values both
show l3 :i-tree[choice](x). The word formed from the idiomatic tree lexeme is
the only possible expression satisfying this constraint. The mother of the result-
ing head-functor construct is a phrase specified both as [LID hl3 :i-tree[choice](x)
i] and [MRKG l2 :i-wrong(x,l3 )], the former inherited from the head daughter and
the latter from the functor daughter, licensed by the Head-Functor Construction.
At the next level, a syntactically idiomatic but semantically canonical lexeme the
selects a phrase with [LIDhl3 :i-tree[choice](x) i] and [MRKG i-wrong-fr], again
licensed by the Head-Functor Construction. The resulting (mother) phrase has
[LID hl3 :i-tree[choice](x) i] and [MRKG the[wrong.tree]-fr]. We will see presently
34
The idiom word up is syntactically a preposition, not a particle. In fact, there is a variant of
the idiom, which some sources characterize as a “mistake,” employing the preposition at instead
of up. One can’t *bark the wrong tree up/at. A referee points out that in order to license [very
wrong] tree – for which Google searches provide substantial support – one has to assume that
intensifier very identifies its MARKING value with that of the AP it modifies.
35
We assume that bark, up, the, wrong, and tree are all idiom words; the and wrong, however,
have the same semantics as the corresponding canonical words, while differing in some syntactic
properties, specifically LID and MRKG features. In particular, the has LID and MRKG features
the[wrong.tree]-fr. We take the semantic breakdown of the idiom to be, roughly, bark ’make’ up
(simply raises the semantics of its object) the ’the’ wrong ’wrong’ tree ’choice’.
36
Note that in the lexemes the, wrong, and tree, the LID and MRKG values are identified, while
this is not the case for the phrases wrong tree and the wrong tree. In Figure 6, and elsewhere
below, the quantifier the appears without a third argument because the phrase being modeled does
not contain the quantifier’s scopal phrase. For example, the(x,l2 ) abbreviates the(x,l2 ,label).
39
FORM hthe, wrong, treei
" #
MRKG 8
SYN NP
CAT [ LID h 1 i]
* +
8 l0 :the(x,l2 ),
SEM FRAMES
7 l2 :i-wrong(x,l3 ), 1 l3 :i-tree[choice](x)
FORM hthei
FORM hwrong,treei
h8i
LID
" # h i
CAT LID h1i CAT LID h i
1
SYN SEL SYN
MRKG 7 MRKG 7
h i
MRKG 8
SEM FRAMES h 7 , 1 i
h i
SEM FRAMES h 8 i
FORM hwrongi
FORM htreei
h7i
LID
" # h i
CAT LID h1i CAT LID h 1 i
SYN SEL SYN
MRKG 1 MRKG 1
h i
MRKG 7
FRAMES h 1 i
SEM
h i
SEM FRAMES h 7 i
40
that a phrase bearing just these constraints satisfies a valence requirement of id-
iomatic up. We have exemplified how the proposed identification in each listeme
of LID and MRKG values in listemes eliminates the otherwise troubling problem
of obligatory modification.
Before continuing with the derivation of ‘barking up the wrong tree’, we pro-
pose a further emendation to the framework of Sag (2012). Verbal lexemes are
there partially defined by the following lexical class construction (Sag 2012:112):
(53) Verb Lexeme Construction (↑lexeme):
ARG - ST hX , . . .i
verb
CAT LID L
none
SELECT
SYN
verb-lxm ⇒
XARG X
MRKG unmk
" #
LTOP 37
SEM l0=q 1
FRAMES L:h([LABEL l1 ])i
41
(54) lex-sign
lexeme
contributing-lxm vacuous-lxm
We note that in (55d) the FRAMES and LID values are not identified, as they are in
(55c).
We now complete our analysis of barking up the wrong tree. Idiomatic up
is of type vacuous-lxm. This lexeme, since it is specified as [SEM [FRAMES hi]],
adds no frames to the semantics of the phrase it projects and it concomitantly
displays an LID value, i-up-fr, distinct from its FRAMES value, hi. As mentioned
before, idiomatic up selects a (noun) phrase with [LID hi-tree[choice]-fri] and
[MRKG the-fr]. The listeme for idiomatic up is given in (56):39
38
Type declarations, which occur in the grammar signature rather than the “constructicon,” em-
ploy the format type: [constraint AVM]
39
Notice that this treatment, like most lexical analyses in the HPSG/SBCG tradition, is highly
modular. Further properties of the lexemes instantiating (56) follow from the interaction of other
constructions, e.g. (i) and (ii):
(i) Intransitive Preposition Lexeme Construction:
" #
ARG - ST hNPx i
intran-p-lxm ⇒ p-lxm &
SEM [INDEX x]
(ii) Preposition Lexeme Construction:
p-lxm ⇒ [SYN [CAT preposition]]
42
FORM hup,the, wrong, tree i
" #
8
SYN PP MRKG
CAT [ LID h 1 i]
sat-hd-comp-cxt
* +
8 l0 :the(x,l2 ),
SEM FRAMES
7 l2 :i-wrong(x,l3 ), 1 l3 :i-tree[choice](x)
word
FORM hupi
" #
FORM hthe, wrong, tree i
preposition " #
CAT MRKG 8
SYN LID hi-up-fri 2 SYN NP
CAT [ LID h 1 i]
VAL h 2 i h i
FRAMES h 8 , 7 , 1 i
" # SEM
SEM INDEX x
FRAMES hi
(56) vacuous-lxm & intran-p-lxm
FORM
hupi
[CAT [LID hi-up-fri]]
SYN
" #
[LID hi-tree[choice]-fri]
CAT
ARG - ST SYN
MRKG the[wrong-tree]-fr
And this lexeme gives rise to a word that projects a saturated head-complement
construct (sat-hd-comp-cxt) whose mother’s FORM value is hup,the,wrong,treei,
as shown in Figure 7. This structure is licensed by the Saturational Head-Complement
Construction (Sag 2012:188), which licenses the up word serving as the head
daughter and the the wrong tree phrase as the complement daughter.
There is no news in the remainder of our analysis of bark up the wrong tree.
The idiomatic listeme bark is formulated to contribute a frame meaning roughly
‘make’, to subcategorize for a subject that represents the agent of that frame and a
43
prep-intran-v-lxm
hbark i
FORM
MRKG 1
[ h i]
1
SYN CAT LID
VAL h NPy , PPx [LID hi-up-fri]i
INDEX s
bark[make]-fr
s
SEM SIT
FRAMES 1
AGENT y
THEME x
PP, headed by idiomatic up, that expresses the incorrect choice made by the agent.
The lexeme for idiomatic bark is shown in Figure 8.
LID and MRKG are not the only features relevant to the analysis of nonlocal
idiomatic dependenciess; XARG has a significant role to play, as well. There are
many English idioms that require referential and agreement identity between a
genitive within an NP and some other argument of the idiom, or which assign a
semantic role to the embedded genitive. Some of these are illustrated in (57)–(58):
44
(59) strans-v-lxm
FORM
hlosei
* " #+
ARG - ST XARG [pron]x
NPx , NPy
hi-cool[composure]-fri
LID
losing-fr
x
SEM
FRAMES AGENT
y
THEME
This specification requires both that the object NP contain a prenominal pronomi-
nal genitive NP and that that pronoun be coindexed with the subject of lose (block-
ing *He lost your cool and the like).
We assume that NPs like your cool are built via the same Genitive Nominal
Construction that is used for such NPs generally. This construction requires that
the mother’s XARG value be identified with the prenominal genitive NP, as in (60):
(60)
FORM hyour, cooli
noun
hi-cool[composure]-fri
gen-nom-cxt CAT
2 LID
SYN
1
XARG
MRKG genitive-fr
" # FORM hcooli
FORM hyouri " #
1 CAT 2
SYN NP SYN
MRKG i-cool[composure]-fr
Thus, because only certain verbs, e.g. keep, lose and blow (in their relevant id-
iomatic senses), select a direct object whose LID value is hi-cool-fri, these are the
only lexical elements that can govern NPs headed by cool in its relevant idiomatic
sense. The genitive within the cool NP and the subject of the governor are always
coindexed. Various semantic treatments are possible. For example Sag (2012)
expresses some analyses in the version of MRS used here and others in the se-
mantics developed in Ginzburg and Sag (2000). The lexical entry in (59) assumes
45
that lose is dyadic, with the direct object NP forming a second semantic argument
(the THEME argument).
The phenomena just discussed are outside the analytic scope of the version
of HPSG developed by Pollard and Sag (1994). As argued in Sag (2010b, 2012),
these data provide considerable motivation for the analysis of verbal and nominal
signs in terms of nonempty XARG specifications. Finally, note that XARG values,
unlike VAL lists, do not ‘shrink’ in a bottom-up progression from head daughter to
mother within an analysis tree. That is, no elements are ‘cancelled off’ an XARG
list—the information about the external argument is locally visible at the top of
the phrasal domain projected by the lexical head because XARG is a CAT feature
and hence is regulated by head feature percolation.
One more idiom further illustrates some of the features of idioms already dis-
cussed and also provides a particularly interesting case of a bound genitive pro-
noun, e.g. blow one’s nose, crane one’s neck, get one’s comeuppance. In the idiom
illustrated in (61), the verb keep involves an AGENT (the keeper of the secret), a
THEME (the item kept ‘under one’s hat’, i.e. ‘kept secret’), and the under-phrase,
which specifies where the secret is kept.
(61) a. Maybe Tim kept the best stories under his hat?
b. This has never been shared before, so keep it under your hat.
c. . . . so I kept it under my hat until I had enough money to shoot it myself.
We follow the online analysis proposed by The Phrase Finder, where under one’s
hat is decomposed as ‘in one’s head’, i.e. not communicated and hence ‘kept
secret’. Notice that there are variant expressions involving keep and under, e.g.
They kept it under wraps, keep it under control, the idiomaticity of which seems
less tightly bound to keep. Thus one finds examples like (62a,b):
(62) a. We wonder how long this story was under wraps.
b. Everything is under control.
By contrast, the idiomatic interpretation of under one’s hat requires the presence
of keep. The invented examples in (63) have no idiomatic interpretation:41
(63) a. This story is under my hat.
b. They have the information under their hat.
41
This was apparently not always the case:
(i) The man whose estate lies under his hat need never tremble before the frowns of fortune.
[What I Remember, by Thomas Trollope (the brother of Anthony), 1887.]
46
Thus our analysis of keep under one’s hat requires a listeme like (64), where keep
selects a subject,
a direct object, and an under-PP whose object is headed
by hat.
(64) obj-control-v-lxm
hkeepi
FORM
VAL hproy i
hi-under[in]-fr(y,z)i
LID
ARG - ST NPx , NPy , PP
XARG pronx
LTOP l
keep-fr
s
SIT
x
FRAMES AGENT
y
THEME
l
STATE
The index y of the first argument of the i-under[in]-fr of the PP valent is identified
with the index of the object NPy of keep and the THEME argument of the keep-fr.
The XARG pronx of the PP valent is coindexed with the subject NPx and its index
x is identified with the AGENT argument of the keep-fr.
Since i-under[in]-fr is an i-frame, the idiomatic interpretation will be licensed
only when selected by an appropriate governer. In this way, the idiomatic use of
the under-phrase is restricted to the contexts in which it is governed by keep.
This idiomatic lexeme for under is sketched in (65):
(65)
trans-p-lxm
hunderi
FORM
" #
XARG X
SYN CAT
LID hYi
" #
XARG X: pronx
ARG - ST NPy , NPz
hi-hat[head]-fri
LID
i-under[in]-fr
Y :LOCANDUM y
SEM
FRAMES
z
LOCATION
47
This lexeme guarantees that idiomatic under governs an i-hat-headed NP whose
pronominal genitive determiner XARG is identitifed with the XARG of the preposi-
tion itself, hence implicitly of the PP it projects, and is coindexed with the subject
of keep, as sketched in Figure 9.
In Figure 9 the tree structure is determined by the interaction of the valence
properties of the individual words and the standard combinatorial constructions of
the grammar that build noun phrases, prepositional phrases, and verb phrases. The
idiom itself has no phrasal properties. Your hat is a canonical noun phrase (deter-
miner phrase, if you prefer); under your hat is a canonical prepositional phrase;
and keep it under your hat is a canonical verb phrase. The phrase-structural re-
lations observed are determined by the constructions that license such phrases in
the grammar generally; no special phrasal machinery is necessary to license the
idiom.
Regarding the composition of the semantics in Figure 9, look first at the
FRAMES value of the root constituent. The keep frame has four arguments: s
(a Davidsonian event variable), x, y, and l2 . Note that x is both the index of the
subject valent of keep and the first argument of the possessor frame 5 , which iden-
tifies the index x of the subject valent and the index x of x’s hat (i.e. ‘head’). This
identification is effected by the joint action of (a) the provision in (64) that iden-
tifies the index of the XARG of keep’s PP[under] argument with that of its subject
argument, and (b) the property of (65) whereby the tag X identifies the XARG
of the preposition under (and therefore implicitly the PP under projects) with the
XARG of its object NP[hat]. The y variable is introduced by the it constituent,
which represents the information that is kept secret. Back in Figure 9, the l2 label
points to the i-under[in]-fr, which says that y (the secret) is in z (x’s head). The
the frame tagged 9 is introduced by the your hat constituent 4 , and represents the
definiteness semantics of the possessive determiner. The frame tagged 6 assigns
the variable z to the index of the hat (head) constituent.42
42
With respect to the percolation of marking values, recall that marking and LID values are
identical within lexemes and that they percolate from heads in head-complement structures and
from functors in functor-head structures.
48
FORM hkeep,
it, under, your, hati
MRKG 8
SYN VPCAT [LID h 8 i]
VAL h 1 NPx i
* +
8 l1 :keep(s,x,y,l2 ), 7 l2 :i-under[in](y,z),
SEM FRAMES
5 l4 :poss(x,z) , 9 l0 :the(z,l4 ,l1 ) , 6 l4 :i-hat[head](z)
word
FORM hkeepi FORM hunder, your, hati
" #
verb
MRKG 7
FORM hiti
SYN CAT " #
LID h 8 i 2 SYN NP
3 SYN PP
LID h 7 i
CAT
XARG 10 [pron]x
VAL h 1 , 2 , 3 i SEM [ IND y]
" # h i
SEM INDEX s
SEM FRAMES h 7 , 5 , 9 , 6 i
FRAMES h 8 i
word FORM hyour, hat i
FORM hunderi
MRKG 8
" # " #
CAT LID h 6 i
preposition SYN NP
SYN CAT LID h 7 i
4
XARG 10
#
VAL h 4 i
"
INDEX z
h i SEM
SEM FRAMES h 7 i FRAMES h 5 , 9 , 6 i
49
6 Locality
The present SBCG approach to idioms maintains strict locality.43 It does not per-
mit any listeme or construction to license a dependency more distant than that
obtaining within a local tree. It thus conforms to the widespread observation that
actual dependencies across the world’s languages tend very strongly to be local,
apparent exceptions such as extraction presenting in many languages evidence of
comprising a series of local dependencies (see e.g., Sag 2007).44 We conclude
that a conservative approach to theory construction seeks a grammatical architec-
ture whose general properties make local dependencies directly expressible and
posits additional machinery to express genuinely non-local dependencies only in
the special circumstances in which they arise, if any. To adapt an aphorism about
simplicity commonly attributed to Einstein, a grammatical theory should be as
powerful as it needs to be and no more powerful.
Richter & Sailer (2009 [R&S]), based on earlier work of Sailer (2003) and
Soehn (2004), have developed a theory of idioms within constructional HPSG that
eschews a lexical and localist approach in favor of a phrasal theory that permits
idioms to establish dependencies at an arbitrary depth.45 Although we find much
of value in their treatment of a difficult grammatical question, there is one aspect
of their theory that we find unnecessarily powerful. One mechanism by which
R&S license such dependencies is by allowing the signs that appear on the DTRS
list to contain a DTRS feature. Consequently, a phrasal sign can be defined to
specify, for example, that a certain idiom word is a daughter, of a daughter, of a
daughter . . . of an argument of a specified idiom predicator. The R&S theory of
decomposable idioms would thus permit the licensing of a spill-the-beans-type
phrasal idiom with idiomatic listemes spill (‘divulge’) and beans (‘secrets’) that
could license sentences such as (66).
(66) a.*Someone spilled that I forgot to remind everyone not to divulge the
43
We would like to thank Frank Richter and Manfred Sailer for discussion of some of the matters
treated in this section. They are of course not responsible for such errors as we have persisted in.
44
To be sure, there exist highly “nonconfigurational” languages, e.g., Warlpiri and other Pama-
Nyungan languages, whose loosely constrained word orders require machinery not necessary for
English. One example is the HPSG Linearization Theory originally proposed for German by Reape
(1994), further developed by several workers, and specifically adapted within a localist and con-
structional HPSG for Warlpiri by Donohue & Sag (1991; see also the references there).
45
“Being able to refer to deeply embedded parts of a phrase in [an idiom of a type to be discussed
below] is an important ingredient to this theory’ (R&S p. 18). It is not clear to us whether R&S
intend the suspension of locality to apply only to idioms or to the grammar generally. If the former,
they do not specify how the restriction to idioms is to be implemented.
50
beans.
b.*Kim spilled that Marion had predicted that Sandy would forget to tell
Pat not to divulge the beans.
That is, the R&S theory would permit there to be an idiomatic lexeme beans
whose entry in the lexicon would specify that beans is dominated at an arbitrary
depth by an idiomatic spill whose ARG - ST requires only a sentential complement.
Of course, the R&S theory would permit the lexicon to refrain from including
such items but it would not forbid it. A theory such as the one we have presented,
which does not permit the licensing of dependencies of arbitrary depth, captures
the generalization that sentences like (66) do not occur.
R&S’s abandonment of locality is not casual. They find it to be necessitated
by certain facts. The particular data which R&S argue require dependencies of
arbitrary depth consist in a number of German idioms in which it appears that
coreference is required between a matrix subject and a non-subject constituent of
a complement clause. Examples are given in (67), in which the constituent marked
X is interpreted as coreferent with the matrix subject.
R&S question the adequacy of a theory of the kind proposed here—as briefly
sketched in Sag (2012 [cited by R&S in a preprint version dated 2007])—based on
the observation that Sag (2012) specifies that the external argument of a predicator
is its subject. Consequently non-subject constituents such as those indicated with
X in (67) are not visible to the matrix subject. If SBCG allows only subjects to
serve as external arguments and the interpretation of coreference between matrix
subject and complement non-subject in (67) is dictated by the grammar rather than
the result of a pragmatic process, then the SBCG approach cannot account for the
facts of (67). R&S suggest that the SBCG approach might be saved by allowing the
ARG - ST to float to the top of a phrase, which would make embedded arguments
visible to the matrix subject at the cost of abandoning locality. They continue,
however, by presenting facts that they consider to pose a problem for even the
51
anti-locality solution just proposed. Interestingly their evidence for challenging
that (to our mind undesirable) solution suggests one of two possible defenses of
the strictly localist, SBCG approach. R&S cite the English idiom illustrated in (68).
In this case, “X is embedded in a locative modifier. Unless locative modifiers are
on the ARG - ST list, the locality assumptions of SBCG do not seem to leave the
necessary kind of structure accessible to enforce coreference between X and the
matrix subject” (R&S p.19).
(68) look as if butter wouldn’t melt [in X’s mouth] (‘look completely innocent’)
R&S (32)
But is the word look part of this idiom? The examples in (69) could be multiplied.
(69) a. This horror novella depicts the fictional life of; [sic] Ethel; a sweet-
looking little old lady, who seems as if butter wouldn’t melt in her
mouth. . .
b. Now Sun is almost 5 and he’s just about the cheekiest monkey out there,
and Shine is 2 and a half and most of the time appears as if butter
wouldn’t melt in her mouth.
c. . . . a brat who pulls all kinds of mischief and then acts as if butter
wouldn’t melt in his mouth.
d. . . . where she sits and sulks for several minutes before returning to the
bedroom to pretend that butter wouldn’t melt in her mouth.
52
In section 4 we cited analogous evidence that—contrary to what one finds
frequently claimed—there is no matrix verb in the side-bread-buttered-on id-
iom. Similarly, the idiom illustrated in (68) and (69) is better analyzed as involv-
ing the words butter, wouldn’t, melt, and mouth, but not any particular clausal-
complement-taking verb, as most clearly demonstrated in (69e,f,g,h). Thomas
Wasow has suggested (pc) that idioms of clausal form may carry a pragmatic re-
quirement that one of the constituents therein is contextually bound. In the case
of (69h), the contextual antecedent is evidently the bullies. In (69g) there is no
controlled pronoun. In (69e,g, and h) there is no higher verb. In (69e) the prag-
matically recoverable antecedent is the person the addressee is asked to look at.
Another English analogue to the German idioms that appear to require corefer-
ence between a matrix verb and a non-subject argument of its complement might
be the idiom illustrated in (70).
(70) My friend didn’t know what hit him when I poured a bucket of water over
his head.
It is common for putatively authoritative sources to include the matrix verb know
in the idiom. For example, the Cambridge Dictionaries Online gives the idiom
as “not know what has hit you.” Wictionary lists it as “not know what hit (one),”
and the Free Dictionary by Farlex (online) gives it as “not know what hit you.”
However, the idiom what hit s.b. occurs both as a complement to many predicators
other than know, and freestanding.
(71) a. National Affairs: ∅ What Hit Him?. . . What had happened to Harold
Stassen in the Nebraska primary?
b. Editorial: Obama still trying to figure out what hit him.
c. “The people are full of anguish, like they don’t understand what hit
them,” Solar TV News reporter David Santos recounted.
d. Ace cueist Pankaj Advani looked uncomfortable in an all-India level
final, making faces, raising eyebrows, trying to make out what hit him
...
e. An excellent multiplayer weapon when used with the gauss - most play-
ers can’t tell what hit them!
f. With the right medical attention, and proper drugging, Lady CBC will
not remember what hit her.
53
g. We were filming a scene where the lndians attack the fort when l sud-
denly developed a splitting headache. l can’t imagine what hit me.
h. Even today, despite numerous works on the crisis—some of them excel-
lent most Americans remain perplexed by what hit them.
We are thus led to imagine the possibility in the case of (67)b, for example,
that the idiom might be restricted to Xacc tritt ein Pferd plus a pragmatic condition
“that it can only be used in contexts describing the mental state of the referent of
the accusative pronoun” (Thomas Wasow pc). We envision the possibility that—
for English at least—there may be no idioms that grammatically require corefer-
ence of a matrix argument with a non-subject of a complement clause and hence
no empirical motivation for a formulation that abandons locality. We tentatively
suggest that something of the sort could also be true of German. The first de-
fense of the lexical approach is therefore the suggestion that possibly there are no
idioms which require coindexation of a non-subject argument of a complement
clause with an argument of a governing verb, the apparent counterexamples con-
taining instead a pragmatic requirement that a pronominal non-subject argument
of a complement clause be bound by a contextually given antecedent—which of-
ten happens to be an argument (usually the subject) of the matrix predicate.
We have not made a full study of the relevant facts of either German or En-
glish (or any other language) and so do not propose this possibility as more than
a suggestion for future research. Let us suppose on the contrary that there are
in fact idioms which require coreference between a matrix subject (or other ar-
gument) and a complement non-subject. We can adjust our theory of idioms to
account for this fact (assuming it is a fact) without abandoning locality. Accord-
ingly, we allow a non-subject argument of an idiom predicator—and only of an
idiom predicator—to serve as external argument. Idioms are, after all, idiomatic;
so it should not occasion great surprise to find that they display a degree of id-
iosyncratic behavior, especially since the argument made visible from above is
usually (if not always) the only “free” argument of the idiom predicator, i.e., the
only argument of the idiom predicator not realized by an idiom word or idiom-
word-headed phrase. By allowing a non-subject argument of an idiom predicator
to serve as external argument we retain locality without sacrificing coverage of
inter-clausal coreference.
We stated above that the SBCG grammar signature adopted here distinguishes
canonical signs from idiomatic signs. Specifically, the LID value of a canonical
sign contains no i-frames.
54
(72) canonical-sign: sign & [LID list(c-frame)]
An idiomatic sign on the other hand contains at least one i-frame in its LID value.
D E
(73) idiomatic-sign: sign & LID i-frame ◦ list(frame) 46
Idiom-predicator verbs are defined as having ARG - ST lists containing at least one
idiomatic sign and allowing any argument to be identified as the XARG. For con-
venience, we first define an idiomatic-argument-list (idiom-arg-list) as a list of
signs at least one of which is idiomatic.
R&S analyze example (67)b to illustrate the treatment in their framework of the
problematical coreference of the matrix subject with a complement non-subject.
46
The symbol ◦ indicates the sequence union (or shuffle) operation (Reape 1994). Shuffle con-
structs a new list from two or more original lists. If L1 ,. . . Ln are lists, then any list containing
all and only the members of L1 ,. . . Ln in which the precedence relations in each of L1 ,. . . Ln are
conserved is a shuffle of L1 ,. . . Ln . (The shufling together of two or more packets of playing cards
provides the image that motivates the name.)
47
This construction will not do for German since German has verbs that lack subjects. “The
first nominal argument with structural case is the XARG, if there is any” (Stefan Müller pc).
48
The symbol ⊕ denotes the append operation, which concatenates two lists. Given lists L1 =
ha1 , · · · , aniandL2 = hb1 , · · · , bm i, L1 + L2 = ha1 , · · · , an , b1 , · · · , bm i.
55
The SBCG analysis of this idiom follows. It is convenient to start from the bot-
tom up, with the idiomatic word Pferd (lit. ‘horse’), which we take to contribute
nothing to the meaning of the idiom.49
cn-lexeme
D E
FORM
pferd
D E
CAT LID i-pferd-fr
SYN
VAL hi
" #
SEM INDEX none
FRAMES hi
The remainder of the idiom consists of two listemes, i-treten (lit. ‘kick’) and a
listeme which R&S term surprise-glauben (hereafter s-g), whose paradigm con-
tains forms of both glauben and denken, the choice of stem determined principally
by tense. This listeme “combines with complement clauses that express (negative)
surprises, astonishment, or annoyance” (R&S p. 17). Surprise-glauben occurs in
idioms of this general character, but also with non-idiomatic complements. Exam-
ples (77b) are presented by R&S to illustrate both idiomatic (a) and non-idiomatic
(b) uses of s-g.
56
b. da muss jetzt echt alles nochmal neu gemacht werden.
this must now really all again new made be
(‘. . . this must all be redone [annoyed]’)
R&S insist, however, that s-g must be considered part of the treten-ein-Pferd
idiom. “Surprise-glauben is an instance of a (special) attitude predicate that also
occurs outside of idioms. For this reason, the matrix predicate need not be re-
stricted to a particular PCl [phraseological clause, e.g., clausal complement of
s-g]. However, the PCl in Figure 3 [depicting the R&S analysis of the idiom
clause: X-acc treten ein Pferd] must be collocationally bound to this special ma-
trix predicate. . . ” (p. 17) . In our analysis, the existence of a surprise-glauben
that takes non-idiom complements along with other s-gs that take particular idiom
complements is represented in the type hierarchy with a general s-g lexeme im-
mediately dominating the s-g-lexeme that takes non-idiom complements as well
as an additional s-g lexeme for each idiomatic complement.
57
The surprise-glauben lexeme appropriate to this idiom, s-g-lxm (. . . ein-Pferd)
subcategorizes for an NP subject. It identifies its subject NP with the XARG of its
clausal i-treten-headed complement, and identifies its semantics with that of that
complement.
s-g-lxm(. . . ein Pferd)
" #
*
SYN CAT LID i-treten[astonish]-fr +
(80) ARG - ST NPi , S [ SEM [ INDEX i ]]
XARG
SEM X
SEM X
To summarize this section: (1) We have suggested that there may exist a prag-
matic alternative to R&S’s interpretation of certain German idioms as syntacti-
cally requiring coreference of a matrix subject and a complement non-subject. We
have suggested instead that the coreference observed may represent a pragmatic
property (conventional implicature) of these idiomatic constructions. (2) We have
shown that if R&S’s syntactic interpretation of the observed cases of coreference
is correct, that with a small alteration the SBCG lexical and localist approach can
account for the data.
Although we disagree with R&S with respect to the matters focused on most
closely in this section, we are entirely in sympathy with their (otherwise) success-
ful effort to incorporate into an explicit theory of grammar a theory of idioms that
takes account of the extensive and complex facts regarding partial productivity
and partial compositionality that idioms present.
(81) Pat pulled [the strings [that got Chris the job]]. (=NSW (33)a)
(82) [The strings [that Pat pulled]] got Chris the job. (=NSW (33)b)
58
Example (81) does not cause a problem for the approach to idioms advanced here,
because the idiom word strings is not governed by a non-idiom predicator.51
Example (82), however, does present a problem for the current analysis be-
cause in this case the nominal phrase headed by i-strings is an argument of the
canonical predicator got. Speaking informally, pulled in the relative clause seems
somehow to license the token of i-strings that is governed in the main clause by
got. The construction in (83) licenses nominal phrases which, although headed
by idiom words, can nonetheless serve as arguments of canonical predicators, for
example, the strings that Pat pulled in (80).52
(83) Canonical Idiomatically-Headed Nominal Construction (↑noun-lxm)
noun
" #
MTR noun
SYN [ CAT [ LID hc-framei]]
" #
canon-idiom-hd-nom-cxt ⇒ noun
X: ,
SYN [ CAT [ LID hi-framei]]
DTRS " #
rel-cl
SYN [ CAT [ SELECT X]]
The construction in (83) simply codes the fact that a noun or nominal phrase with
[LID i-frame] can, when modified by a relative clause, head a nominal phrase with
[LID c-frame] and thus serve as an argument for a canonical predicator, as in (82).
If this isolated fact can be found in future research to follow from something more
general, that development will of course be welcome. Meanwhile, a construc-
tional approach such as SBCG, while always seeking the smallest number of the
most general constructions the facts permit, provides a theoretical vehicle to ac-
count for isolated facts of this kind without departing from explicit representation.
One approach compatible with much of our present analysis that could obviate
the need for the above construction would be to step back from the expectation
51
Although (81) and (82) are invented examples, attested ones are easy to find. Compare (i) to
(81) and (ii) to (82).
(i) . . . the “big city” bankers who, many people were convinced, pulled the strings that manip-
ulated the political system as well as the economy.
(ii) . . . the daughter was not aware of the strings that were being pulled for her.
52
And the strings that were being pulled for her in (ii) of the preceding note.
59
that idioms are licensed by the same process of unification that admits or rejects
local syntactic phrasal structures. As Nunberg et al. (1994) argue convincingly,
idioms express dependencies which are in part semantic, and hence the entries
in an ‘idiom lexicon’ might best be analyzed as well-formedness conditions on
the semantic representations of whole sentences, not ones that can be checked
for satisfiability phrase by phrase. The conventional lexicon would still contain
idiomatic entries that express any morphological or syntactic idiosyncracies, and
would provide a necessary but not sufficient set of constraints on the licensing of
idioms. On this approach, the conventional lexicon and inventory of constructions
would provide a syntactic analysis of an infelicitous expression such as he will
kick the pail using the idiomatic lexical entry for kick, but the entry in the (se-
mantic) idiom lexicon for kick the bucket would reject the sentence because the
predication for idiomatic kick would be required to take idiomatic bucket as its
second argument. The principal advantage of this dual approach to the licensing
of idioms is that for McCawley’s example the strings that were being pulled for
her, all of the necessary semantic dependencies, coming from multiple clauses,
are present at the point where the licensing of idiomatic semantic dependencies is
determined, namely once the full syntactic analysis of the sentence is complete.
One obvious drawback is the need to posit a separate semantics-based idiom lex-
icon along with a well-formedness check that would be applied on the semantics
composed for whole sentences, not phrase by phrase. If further investigation of
idiomatic expressions uncovers more varieties of cross-clausal dependencies, the
advantages of this two-phase licensing of idioms may come to justify the added
complexity in such a framework.
8 Conclusion
We have outlined a lexical theory of phrasal idioms. A pervasive property of flex-
ible idioms is that the words that appear in citation forms unmodified are often
modified in actual use. This modification is theoretically crucial because it shows
that the modified words are not merely words phonologically but also semanti-
cally and syntactically. In the following examples, we have confined ourselves to
internal modification (ignoring external modifiers like proverbial, metaphorical,
literal, figurative, etc. and epithets like danged or confounded.) In the (84) exam-
ples idiomatic cat is modified; in the (85) examples idiomatic bag is modified. We
present what may seem like an excessive number of (attested) examples to illus-
trate the point that ordinary modification of idiom words is commonplace. Modi-
60
fiability of idiom words demonstrates that they are real words with real meanings,
which have to be combined to produce meaningful phrases and sentences by the
same grammatical machinery that combines the meanings of ordinary words.
(84) a. I’ll bet there might even be more than a few wealthy Palm Beachers
who also have not let the financial cat out of the bag to everyone in
their family.
b. Now, however, it has to be fleshed out, and in that respect you have not
let the whole cat out of the bag.
c. We’re one day closer to Capcom Cup and it’s definitely time to let the
final cat out of the bag with the Super Street Fighter 4 AE brackets.
d. Heather said that Blossom almost let the entire cat out of the bag to-
day. . . read on if you want a glimpse of what is coming.
e. Hamid Karzai has let the Pentagon’s cat out of the bag—to the displea-
sure of the Obama Administration.
f. So, without further ado, it brings me great pleasure to finally let the
(first) cat out of the bag and announce today, the Fritz & Fräulein Col-
lection will be featured and sold at Tommy Hilfiger’s beautiful new flag-
ship store,. . .
g. “Might as well let the other cat out of the bag,” he said: . . .
h. Solo let the Olympic’s cat out of the bag when she narrated her experi-
ence to the press.
i. Don’t let the last cat out of the bag.
j. Now, however, it has to be fleshed out, and in that respect you have not
let the whole cat out of the bag.
k. Well the liberal Austin bloggers let the political cat out of the bag.
l. Georgia, always the truth bringer of the family, let the cinematic cat out
of the bag, when she allowed that the picture was not such an ordeal as
it would have been had their lives been filmed for the entire year they
spent on the island. . . .
(85) a. Time to let the cat out of the Instagram bag. . . I’m pregnant!
b. Who let the cat out of the Bloomingdales bag?
61
c. Kandi Burruss may have let the cat out of the Bravo bag last night dur-
ing “Watch What Happens Live.”
d. Someone let the cat out of the Duluth Pack bag a week early.
e. . . . a fetching real estate purveyor down in Miami Beach who kindly let
the cat out of the celebrity real estate bag about six-time Grammy-
winning singer/songwriter Billy Joel. . .
f. So, Mary Burke has finally let the cat out of the cellophane bag by
formally announcing her candidacy for governor.
g. . . . and this week the folks at Zillow let the cat out of the real estate bag
h. Yup, I let the cat out of the family bag. Probably the whole reason I am
straightforward. . .
i. Now that director Wes Ball has let the cat out of the concept art bag,
j. Perhaps sensing that he’d let the cat out of the plot bag a little early,
King then told Cronenberg and the audience that he wasn’t completely
committed to the new novel . . .
k. Good morning everyone! Time to let the cat out of the internet bag.
l. A little less surprising now you’ve let the cat out of the fur-trim, dia-
mante embellished clutch bag, but we suppose it’s the thought that
counts.
m. But a guy in Philadelphia (not Mississippi, Pennsylvania this time) let
the cat out of the voter suppression bag:. . .
n. But some believe it is because some major problems are on the way and
NASA does not want other Scientists to let the cat out of the cosmic bag
to [sic] soon.
The modifiability of idiom words argues for a lexical theory of idioms. Id-
iom words are real words; they share the morphological and phonological proper-
ties of their literal counterparts through multiple inheritance and contribute their
own idiomatic meanings. Often these meanings have been motivated historically
by metaphors or metonymies of varying degrees of currency. Idiom words are
combined with ordinary words and phrases into the phrases and sentences of the
grammar by the familiar combinatorial constructions; no special combinatorial
machinery is required for idioms. Once idiom words are defined and idiom pred-
icators appropriately distinguished from canonical predicators, nothing further is
62
needed. The ordinary combinatorial constructions that are required for the regular
grammar suffice. We suggest that in the past insufficient attention has been paid
to the description of idioms within explicit theories of grammar, and we have at-
tempted to do so here within the approach of Sign-Based Construction Grammar.
In particular, we have shown that SBCG can provide an analysis in which
the occurrence of an idiom word is predicted to be precisely the limited range
of environments in which it is observed to occur and that such prediction can
be achieved in a formal approach that abjures movement, empty categories, and
dependencies of arbitrary depth.
Our theory of idioms is lexicalist because we find that syntactically flexible
idioms are semantically compositional: their meanings are put together from the
meanings of special idiom words according to the same processes—combinatorial
constructions—that put together the meanings of ordinary phrases and sentences.
A corollary of this finding is that syntactically inflexible idioms are inflexible
precisely because (some of) the words of which they are composed, being mean-
ingless, fail to meet the semantic requirements of the constructions that would
provide their flexibility. A welcome result of the commitment to a lexical ap-
proach is that little has had to be added to existing theory to account for the data
of idioms.
References
Abeillé, Anne. 1995. The flexibility of French idioms: A representation with Lex-
ical Tree Adjoining Grammar. In M. Everaert, E.-J. v. d. Linden, A. Schenk, and
R. Schreuder, eds., Idioms. Structural and Psychological Perspectives, pages
15–42. Lawrence Erlbaum Associates, Hillsdale.
Allegranza, Valerio. 2007. The Signs of Determination. Constraint-Based Mod-
elling Across Languages, vol. 16 of SABEST: Saarbrücker Beiträge zur Sprach-
und Translationswissenschaft. Frankfurt am Main: Peter Lang.
Boas, Hans C. and Ivan A. Sag, eds. 2012. Sign-Based Construction Grammar.
Stanford, CA: CSLI Publications.
Cacciari, Christina and Patrizia Tabossi. 1988. The comprehension of idioms.
Journal of Memory and Language 27:668–683.
Chaves, Rui Pedro. 2012. On the grammar of extraction and coordination. Natural
Language and Linguistic Theory 30(2):465–512.
Colombo, Lucia. 1993. The comprehension of ambiguous idioms in context. In
C. Cacciari and P. Tabossi, eds., Idioms: Processing, structure and interpreta-
63
tion, pages 163–189. Hillsdale, NJ: Erlbaum.
Copestake, Ann, Dan Flickinger, Carl J. Pollard, and Ivan A. Sag. 2005. Minimal
recursion semantics: an introduction. Research on Language and Computation
3(4):281–332.
Donohue, Cathryn and Ivan A. Sag. 1991. Domains in Warlpiri. Paper pre-
sented at HPSG-99 – University of Edinburgh. Non-final draft available at
http://www.cs.toronto.edu/ gpenn/csc2517/donohue-sag99.pdf.
Fillmore, Charles J. 1982. Frame semantics. In In Linguistics in the Morning
Calm, pages 111–137. Seoul: Hanshin Publishing Co.
Fillmore, Charles J. 1985. Frames and the semantics of understanding. Quaderni
di Semantica 6:222–254.
Fillmore, Charles J. 1986. Pragmatically controlled zero anaphora. In Proceedings
of the Twelfth Annual Meeting of the Berkeley Linguistics Society, pages 95–
107. BLS.
Fillmore, Charles J. and Colin Baker. 2010. A frames approach to semantic anal-
ysis. In B. Heine and H. Narrog, eds., The Oxford Handbook of Linguistic
Analysis, pages 313–340. Oxford: Oxford University Press.
Fillmore, Charles J., Paul Kay, and Mary C. O’Connor. 1988. Regularity and
idiomaticity in grammatical constructions: The case of let alone. Language
64:501–538.
Hillert, Dieter and David A. Swinney. 2001. The processing of fixed expressions
during sentence comprehension. In Conceptual structure, discourse, and lan-
guage, pages 107–121. Stanford, CA: CSLI Publications.
Kay, Paul and Charles Fillmore. 1999. Grammatical constructions and linguistic
generalizations: The what’s x doing y? construction. Language 75.1:1–33.
Kay, Paul and Ivan A. Sag. 2012. Discontinuous dependencies and complex deter-
miners. In H. C. Boas and I. A. Sag, eds., Sign-Based Construction Grammar,
pages 211–238. CSLI Publications.
McCawley, James D. 1981. The syntax and semantics of Enlish relative clauses.
Lingua 53(2):99–149.
Nunberg, Geoffrey, Ivan A. Sag, and Thomas Wasow. 1994. Idioms. Language
70(3):491–538.
O’Grady, William. 1998. The syntax of idioms. Natural Language and Linguistic
Theory 16:279–312.
Osborne, Timothy. 2012. Edge features, catenae, and dependency-based minimal-
ism. Linguistic Analysis 34:321–366.
Pollard, Carl J. and Ivan A. Sag. 1994. Head-Driven Phrase Structure Grammar.
Chicago: University of Chicago Press.
64
Reape, Mike. 1994. Domain union and word order variation in German. In J. Ner-
bonne, K. Netter, and C. J. Pollard, eds., German in Head-Driven Phrase Struc-
ture Grammar, no. 46 in CSLI Lecture Notes, pages 151–197. Stanford Univer-
sity: CSLI Publications.
Richter, Frank and Manfred Sailer. 2009. Phraseological clauses in constructional
hpsg. In Proceedings of the 16th International Conference on Head-Driven
Phrase Structure Grammar, University of Göttingen, Germany, pages 297–317.
Stanford: CSLI Publications.
Rommers, Joost, Ton Dijkstra, and Marcel C. M. Bastiaansen. 2013. Context-
dependent semantic processing in the human brain: Evidence from idiom com-
prehension. Journal of Cognitive Neuroscience 25:762–776.
Sag, Ivan A. 2010a. English filler-gap constructions. Language 86:486–545.
Sag, Ivan A. 2010b. Feature geometry and predictions of locality. In G. Corbett
and A. Kibort, eds., Features: Perspectives on a Key Notion in Linguistics,
pages 236–271. Oxford: Clarendon Press.
Sag, Ivan A. 2012. Sign-based construction grammar: An informal synopsis. In
H. C. Boas and I. A. Sag, eds., Sign-Based Construction Grammar, pages 69–
202. Stanford, CA: CSLI Publications.
Sag, Ivan A., Timothy Baldwin, Francis Bond, Ann Copestake, and Daniel P.
Flickinger. 2002. Multiword expressions: A pain in the neck for NLP. In Pro-
ceedings of the Third International Conference on Intelligent Text Processing
and Computational Linguistics (CICLING 2002) , pages 1–15. Mexico City,
Mexico.
Sag, Ivan A., Thomas Wasow, and Emily M. Bender. 2003. Syntactic Theory: A
Formal Introduction. Stanford: CSLI Publications, 2nd edn.
Sailer, Manfred. 2003. Combinatorial semantics and idiomatic expressions in
Head-Driven Phrase Structure Grammar. Phil. Dissertation (2000). Arbeitspa-
piere des SFB 340. 161, Universität Tübingen.
Soehn, Jan-Philipp. 2004. License to COLL. In S. Müller, ed., Proceedings of
the HPSG-2004 Conference, Center for Computational Linguistics, Katholieke
Universiteit Leuven, pages 261–273. Stanford: CSLI Publications.
Sprenger, Simone A., Willem J. M. Levelt, and Gerard Kempen. 2006. Lexical
access during the production of idiomatic phrases. Journal of Memory and
Language 54:161–184.
Swinney, David A. 1981. Lexical processing during sentence comprehension: Ef-
fect of higher order constraints and implications for representation. In T. Mey-
ers, J. Laver, and J. Anderson, eds., The cognitive representation of speech ,
Advances in Psychology Series, pages 201–209. Amsterdam: North-Holland.
65
Van Eynde, Frank. 2006. NP-internal agreement and the structure of the noun
phrase. Journal of Linguistics 42(1):139–186.
Van Eynde, Frank. 2007. The big mess construction. In S. Müller, ed., The Pro-
ceedings of the 14th International Conference on Head-Driven Phrase Structure
Grammar, Stanford University, pages 415–433. Stanford: CSLI Publications.
Wasow, Thomas, Ivan A. Sag, and Geoffrey Nunberg. 1984. Idioms: An interim
report. In S. Hattori and K. Inoue, eds., Proceedings of the 13th International
Congress of Linguistics (Tokyo) , pages 102–115. The Hague: CIPL.
66