Phonological Production in Taiwan Sign Language*
James Myers, Hsin-hsien Lee, and Jane Tsay
National Chung Cheng University
This paper describes an experiment on the production of handshape change in
Taiwan Sign Language using the implicit priming experimental paradigm (Meyer
1990, 1991). The results not only provide new evidence that phonological form
plays an important role in sign production, but also that the time course of sign
production closely matches that predicted by a prominent model of spoken word
production (Levelt, et al. 1999). The experiment further highlights important
methodological considerations in the study of phonological production, not only for
sign language, but for spoken language as well.
Key words: sign language, Taiwan Sign Language, phonology, psycholinguistics
1. Introduction
A key function of language is to transmit mental representations through a physical
medium, and the role of phonology is to perform the translations closest to the border
between the mental and the physical. For functional reasons, then, sign languages require
phonology just as much as spoken languages do.1 Research on sign language phonology has
in fact flourished over the past few decades (see overviews in Klima and Bellugi 1979;
Padden and Perlmutter 1987; Liddell and Johnson 1989; Coulter 1993; van der Hulst and
Mills 1996; Lucas and Valli 2000; Sandler 2000; among numerous other books, journal
articles, and dissertations). This research has demonstrated that in addition to functional
similarities, sign language phonology also shares essential formal properties with spoken
language phonology, revealing that all human languages involve an “abstract system
underlying the selection and use of minimally contrastive units” (Corina 1990:27). In sign
languages, these contrastive units include handshapes, which behave like phonemes or
distinctive features (as first demonstrated by Stokoe 1960 and Stokoe, et al. 1965 for
American Sign Language [ASL]). Thus a pair of words may be distinguished solely by the
fact that one involves making a fist while the other involves making an open handshape, and
each of these handshapes may appear in many other words unrelated in meaning (i.e. the
handshapes are phonological rather than morphological units). Moreover, just as in spoken
languages, the arrangement of units is also phonologically important in sign languages. Thus
a sequence of different handshapes may appear within a single word (e.g. fist to open, or open
to fist), and also like spoken languages, not all logically possible arrangements are
*
Thanks to the signers who participated, Ku Yu-Shan for signing in the illustrations in this paper, David Corina
and Chiu Jung-Shuang for help choosing the response time measure, Chen Jenn-yeu for advice on the implicit
priming paradigm, Wang Wenling for help designing the experimental program, Tsao Hsiu-chien for help
running the experiment, and the audience at the International Symposium on Taiwan Sign Language Linguistics
for comments. I am also indebted to feedback from the two reviewers, in particular David Corina, who did not
wish to remain anonymous. The research was funded by National Science Council grant NSC 91-2411-H-194002.
1
Stokoe (1960) introduced the term cherology (cher = hand) for this interface system in sign languages, but
linguists eventually realized there was no need to be tied to the etymology of the term phonology (“study of
sound”), any more than etymology determines the synchronic use of linguistic nomenclature like morphology
(“study of shape”) and syntax (“arranging together”).
1
grammatical (Sandler 1989, 1990; Brentari 1990, 1998; Corina 1990, 1993; Uyechi 1996).
The functional and formal similarities between spoken and sign language phonologies
suggest that they may be processed in similar ways as well. When preparing to produce a
word, for example, signers should mentally activate similar types of phonological
representations and carry out similar operations, in similar orders, as do speakers of
languages like Mandarin or English. At a bare minimum, word production in sign languages
should involve the activation of some aspect of phonological form, as has been well
established from research on spoken languages, both from natural slips of the tongue and
speeded reaction time experiments. To date, however, evidence for the use of phonological
form in sign production has been somewhat inconclusive. There is no doubt that phonological
form plays a role in language errors (so-called “slips of the hand”), as shown by several
studies, beginning with Klima and Bellugi (1979). Yet a speeded reaction time experiment
reported by Corina and Hildebrandt (2002) failed to show clear effects of phonological form
in ASL, raising the possibility, as these authors suggest, that modality differences between
spoken and signed languages result in deep differences in phonological processing.
This paper addresses the question of phonological production in sign language with
fresh evidence and analyses. The heart of the paper is the description of an experiment on the
production of handshape change in Taiwan Sign Language (TSL). We apply the implicit
priming experimental paradigm developed by Meyer (1990, 1991) for the study of spoken
language but which we use for the first time, as far as we are aware, in the study of a sign
language. Our results provide evidence that phonological form does indeed play a role in sign
production, though as in the experiment reported by Corina and Hildebrandt (2002),
phonological forms did not affect reaction time directly. However, examining the results
within the model of word production presented in Levelt, et al. (1999), we argue that the lack
of reaction time effects is due to the experimental methodology, a conclusion that
paradoxically has quite promising implications. Not only do the overall reaction times and
pattern of error rates that we found imply that phonological production in sign language
works in a fashion entirely parallel, even down to specific temporal detail, as that in spoken
language, but in addition, analysis of our results suggests that our methods may allow
researchers to use the study of sign language to illuminate aspects of phonological production
that are more difficult to study in spoken language.
Before we describe the experiment and its results, we first provide some background on
TSL phonology (section 2) and on the study of phonological production (section 3).
Descriptions of the experiment (sections 4) and its interpretation (section 5) are then followed
by a general discussion of its implications for research on phonological production in both
spoken and sign languages (section 6).
2. Handshape in Taiwan Sign Language phonology
As with all sign languages that have yet been studied, in TSL the form of signs can be
analyzed into a relatively small inventory of basic handshapes (see lists given in Smith and
Ting 1979, 1984).2 Also like other sign languages, a subset of these handshapes appears in
sign-internal handshape changes. Here we simply illustrate these two empirical observations
using our experimental materials as examples, addressing theoretical implications only in so
far as they are relevant to our experiment.
Consider first the handshapes described in Table 1 (signs containing these handshapes
are illustrated in Appendix A). English names for signs here and throughout the paper come
from the TSL primers Smith and Ting (1979, 1984), and following the standard convention in
2
An updated list of TSL handshapes is also given in the appendix to Chang, Su, & Tai (this volume).
2
sign language research, names for signs are given in all capitals, to indicate that they are
merely convenient labels rather than glosses. The names for the handshapes are English
translations for the Chinese names given in Smith and Ting (1979, 1984), which are taken
from TSL words in which they prominently appear. Note that these signs are not identical to
the eponymous handshapes, since the phonological forms of actual words also require
specification of location, orientation, and movement, and in addition may involve both hands,
handshape change, and/or nonmanual (e.g. facial) features (see Smith 1989 for discussion of
distinctive features that can be used to analyze TSL more fully). To avoid confusion with
names of actual signs, names of handshapes are italicized and placed in square brackets.
Table 1. Some handshapes used in TSL3
Handshape name Description
[ZERO]
loosely closed fist,
finger tips touching thumb tip
[HAND]
flat open hand, fingers together
[SIX]
[SAME]
[LÜ]
[ONE]
[RENT]
thumb and index extended,
rest of fingers closed
open hand with curved,
spread fingers
thumb tip touches index tip,
rest of fingers closed
index extended, rest of fingers closed
thumb tip touches middle finger tip,
rest of fingers open
Example signs in Appendix A
ZERO, CUT CLASS
HAVE, WRITE, STICK,
CUT CLASS
SIX, FAST
PLACE, RENT
LÜ, RICE, WRITE
RICE
RENT, STICK
The phonological status of these handshapes is established by two related arguments.
First, there are minimal pairs of words (signs) that are distinguished by use of these
handshapes, such as ZERO and HAVE ([ZERO] vs. [HAND]). Second, pairs of
morphologically and semantically unrelated signs can be analyzed as containing the same
handshapes (i.e. location, movement, orientation, and nonmanual features may differ while
handshape does not). For example, as is illustrated in Appendix A, the sign FAST is made
with the [SIX] handshape, but differs from the sign SIX in orientation and movement.
Similarly, the two-handed sign CUT CLASS involves both [ZERO] and [HAND] on different
hands, the sign RICE involves [LÜ] on the nondominant hand (e.g. the left hand for a righthanded signer), the sign WRITE involves [LÜ] on the dominant hand, and in the sign STICK,
[HAND] and [RENT] appear both simultaneously (on opposite hands) and sequentially (on
the dominant hand).
For readers less familiar with the sign language literature, the analytic decomposability
of these signs into handshapes is perhaps less salient than their high degree of iconicity. This
typical characteristic of sign languages is actually much less relevant to phonological
research than one might think. In addition to the fact that the meanings of signs are more
often merely “translucent” from their forms than truly “transparent” (a distinction made by
Klima and Belugi 1979), it cannot be the case that signers derive all aspects the physical form
of signs from meaning directly. For example, such a hypothesis would not explain why the
shape of the dominant hand in WRITE deviates from the actual shape of a hand holding a pen
3
LÜ is a family name, and the sign for it mimics the shape of the Chinese character.
3
or pencil (or chalk or brush, for that matter), nor why this handshape appears in precisely the
same form in the semantically unrelated words RICE and LÜ, nor indeed why TSL signers
represent the word meaning “write” in anything like this form at all, while signers of other
sign languages may use some other form. Regardless of the functional role of iconicity,
therefore, a formal theory of phonology is still necessary (for various opinions on the role of
iconicity in sign language, see Klima and Bellugi 1979; Armstrong, et al. 1995; Taub 2000).
As with spoken language phonology, analyzing signs into basic phonological units
quickly leads to important but difficult questions about the structure of the phonological
system. In the sign RENT, for example, the final handshape appears to be physically similar
to the [SAME] handshape seen in the sign PLACE. Thus one analysis would be to consider
[SAME] here to be lexically specified just like the initial handshape [RENT], similar to how
spoken languages form words from lexically specified combinations of consonants and
vowels. This is the position taken, within very different formal frameworks, by Liddell (1990)
and Uyechi (1996). However, most sign phonologists believe that this analysis misses the
high degree of predictability between handshapes in the vast majority of sign-internal
handshape change: aside from a small set of exceptions (including monomorphemic signs
historically derived from compounds), change always involves either all fingers or a specific
subset of adjacent fingers, it always involves opening and closing all of the specified fingers
(never opening some and closing others), and often, as in the case of [RENT] and [SAME] in
the sign RENT, one handshape is simpler than the other (e.g. all open in the case of [SAME]).
This suggests that in some sense one handshape is derived from the other. While differing in
technical details and the precise scope of their empirical predictions, Sandler (1989), Brentari
(1990, 1998) and Corina (1990, 1993) all present formal phonological analyses of ASL
handshape change that capture this key insight. There is yet a third possibility, though,
namely that handshape changes should not be analyzed as sequences at all, but rather as
wholes related only indirectly to their apparent components, similar to the way affricates are
sometimes analyzed (see e.g. Lombardi 1990 and Steriade 1993); Channon (2002) takes a
position similar to this in analyses of ASL and other sign languages.
One of our ultimate goals in investigating TSL phonology experimentally was to provide
a new source of data to address theoretical questions like these. However, for the purpose of
providing background to the experiment described in this paper, which merely attempts to
establish that phonological form does indeed play a role in sign production, it suffices to
show that handshape change is a genuine aspect of the phonology of TSL. This can be seen
quite clearly from the nine signs used as our experimental items, described in Table 2 and
illustrated in Appendix B.
Table 2. Target items in the experiment
Sets
1. [ZERO] > [SAME]
Homogeneous 2. [LÜ] > [SIX]
groupings
3. [RENT] > [SAME]
Heterogeneous groupings
FLOWER
SUN
NEW
SMART
BEAN
WAKE UP
NO BIG
INVENT
NEVER
DEAL
BEFORE
These signs can be put into three phonologically homogeneous groupings (using the
terminology established by Meyer 1990, 1991) so that all three members of a group share the
same handshape changes. In Table 2, these groupings are arranged horizontally, with the
shared handshape changes described in the Set column (“>” represents “changes into”). Note
that they differ in most other phonological features (location, orientation, path of movement,
and sometimes nonmanual features). Following the design established in Meyer (1990, 1991),
4
the nine signs can also be put into heterogeneous groupings (arranged vertically in Table 2)
such that the three members in each grouping do not share initial handshapes or overall
handshape change. Since all nine signs are monomorphemic, share no obvious semantic
features, and represent a variety of syntactic classes, it seems that any possible difference in
the processing of the homogeneous groupings versus the hetereogeneous groupings would
have to be ascribed to their form.
It should be noted that we assume that the relevant form level here is phonological
(representable in terms of abstract categorical features) rather than merely phonetic
(involving physical similarities that do not necessarily correspond to abstract features). It is
notoriously difficult to separate these levels in practice (see also footnote 4 below), although
in the next section we will review some arguments given by Meyer (1990) and elsewhere for
supposing that the experimental paradigm we will apply does indeed tap into an abstract
phonological level. Our target item set inadvertently may help provide another argument,
since as pointed out by David Corina (p.c., May 3, 2004), one of our sets may actually
involve phonetically similar but not truly phonologically identical target items. Namely, in
Set 2, all three target items begin with the thumb and index finger forming a closed ring, but
the nature of the finger contact is not the same: two target items (SMART and BEAN) begin
with what Liddell and Johnson (1989) call “finger restrained contact” (the index finger nail
contacts the thumb pad, ready to be “flicked” off), while the third (WAKE UP) begins with
what they term “thumb pad contact”. Since this difference is not predictable, it should be
treated as phonemic. If our experimental paradigm is sensitive to phonological
representations, the Set 2 items, in spite of a great deal of phonetic similarity, should not
behave as “homogeneously” as the other two sets, the target items in which do indeed appear
to involve precisely the same phonological handshape changes.
3. Phonological production
Phonological knowledge has empirically observable effects not only in the patterns of
distributions and alternations studied by linguists, but also in the physical forms analyzed by
phoneticians, and in the behavior of language users when perceiving, recognizing, judging, or
producing phonological forms. This paper focuses just on one of these sources of evidence,
the production of words in isolation. One reason for this focus is the existence of a highly
sophisticated model of word production developed by Willem Levelt and colleagues (see
Levelt, et al. 1999). Armed with such a detailed model, experimental phonology is able to go
beyond the traditional search for mere “psychological reality” for various linguistic claims
and instead see language use as consisting of processes that occur in real time. Our
experiment was designed to begin the investigation into the time course of sign production by
using an experimental paradigm also prominent in the development of Levelt’s model. In this
section we briefly review the relevant aspects of this model and the evidence that has been
used to support it (section 3.1). We then describe the few studies that have looked at word
production in sign languages, and discuss their implications for modeling (section 3.2).
3.1 Phonological production in spoken language
The model presented in Levelt, et al. (1999) aims to be a complete model of word
production in spoken language, and as such describes not only phonological production but
also the processes involved when speakers choose words from among semantic competitors,
as well as the processing of syntactic features and morphological structure. For our purposes
the model can be described as dividing word production into three major stages: stage 1
involves the processing of word information prior to access of phonological form from the
5
lexicon, stage 2 involves phonological encoding, and stage 3 involves phonetic preparation
prior to articulation.
This division and ordering should seem quite familiar, since it is quite close to the
traditional linguistic view.4 What makes the model so powerful, however, is the range and
quantitative detail of empirical evidence that Levelt and his team have collected in support of
it, and its consequent degree of precision. For example, experimental evidence has gone
beyond previous models in suggesting that stage 2 itself consists of at least two distinct
processes: accessing the phonological form from memory (what we’ll call stage 2a), and
mapping phonological content (e.g. phonemes) into prosodic structure (stage 2b). The
research team has even managed to determine estimates for the temporal durations of each
stage (Levelt, et al. 1998; Levelt and Indefrey 2000). Table 3 gives estimates for these stages
in picture naming in milliseconds (msec).
Table 3. Estimated time course for picture naming in spoken language.
Stage
1. Processing prior to phonological access
2. Phonological encoding:
2a. Initial access from lexicon
2b. Mapping of units into prosody
3. Phonetic preparation
Duration
275 msec
125 msec
Cumulative time
275 msec
400 msec
200 msec
600 msec
Evidence for these stages, their ordering, and their duration come from a wide variety of
sources (summarized in Levelt, et al. 1998 and Levelt, et al. 1999). The evidence most
familiar to linguists comes from natural speech errors (e.g. Fromkin 1971, 1973, 1980; Cutler
1982; Garrett 1980, 1988; Stemberger 1983). Among other things, nonphonological errors
tend to operate independently of phonological errors (stage 1 before stage 2), and phoneme
deletions, insertions, perseverations, anticipations and exchanges trigger the application of
allophonic processes (stage 2 before stage 3). However, to test more detailed hypotheses
about the stages and their time course, experimental methodologies must be used.
One powerful piece of evidence that stage 1 is prior to stages 2 and 3 comes from the
picture/word interference paradigm. In this paradigm, pioneered by Schriefers, et al. (1990),
experimental participants must produce the name of pictured objects while hearing auditory
distracters at the same time. Semantically related distracters only affect production latencies
(i.e. the duration between presentation of the visual prompt and initiation of articulation)
when presented early, while phonologically related distracters only affect production latencies
when presented late.
Experimental evidence is also crucial in establishing the distinction between stage 2a
(accessing phonological form) and stage 2b (mapping units into prosodic structure). One key
difference between these stages is that access in stage 2a can be affected by activation of any
part of the phonological form, but the mapping process of stage 2b proceeds strictly from left
to right (i.e. from the beginning of the word). As pointed out by Levelt, et al. (1999), a
particularly striking argument in favor of these claims comes from the different effects of
explicit versus implicit phonological priming. When primes are presented explicitly, as for
example as distracters in a picture/word interference experiment, production latencies for
words will be sped up whether the primes match the beginning or ending of the target word
4
Its division of phonology and phonetics into separate stages does not seem to fit well with the “emergent
categoricality” approaches presented in Kirchner (1997), Boersma (1998), Steriade (2000), Myers and Tsay
(2003), and elsewhere. Discussion of such issues, however, goes far beyond the scope of this paper.
6
(Meyer and Schriefers 1991). Since processing of explicit primes involves auditory access as
well as production, the effect here presumably occurs during selection of the phonological
form of the target, not during the mapping stage.
The effect of implicit priming is quite different. In the implicit priming paradigm,
pioneered by Meyer (1990, 1991), experimental participants are asked to memorize small
collections of cue-target pairs; the pairs are designed to ease retrieval of the target without
making it entirely predictable (e.g. house-room, bridge-poker). In homogeneous groupings,
the targets are all phonologically similar, while in heterogeneous groupings, they are not. The
participants are then presented with the cue words and must produce the associated targets as
quickly as possible. The implicit priming effect is defined as a shorter production latency for
a given target word when trained in a homogeneous grouping than when trained in a
heterogeneous grouping. The assumption is that this effect is due to the implicit primes
assisting in the on-line encoding of phonological forms. The alternative possibility that the
implicit priming effect is due to mere phonetic factors, such as motor preparation, is rejected
by Meyer (1990) and later work because the effect is greater with a greater amount of overlap
in the primes, which requires the involvement of a whole-word representation, not just
instructions about how to start it. Another alternative hypothesis, namely that the implicit
primes merely aid retrieval of phonological forms from long-term memory (at stage 2a). As
pointed out by Meyer (1990), the size of the training sets in the implicit priming experiments
is much smaller than the size of training sets that memory studies have found are required to
aid lexical retrieval; Cholin, et al. (2004) also note that immediate serial recall tasks involving
phonologically similar words give rise to slower response times, not faster ones as they do in
the implicit priming paradigm. Most importantly, the memory retrieval hypothesis is also
inconsistent with the finding that implicit priming only occurs if words are phonologically
similar at the beginning, i.e. the first phoneme(s) or first syllable(s); by contrast, as noted
above, activation of phonological forms in memory at stage 2a can be triggered by
phonological cues anywhere in the word. The left-to-right nature of implicit priming implies
that the training sessions in this paradigm do indeed allow speakers to prepare part of the leftto-right mapping into prosodic structure.
Implicit priming experiments provide evidence not only for this mapping process but
also for the prior stage when the phonological form is accessed. According to Meyer (1991),
this stage reveals itself in implicit priming tasks through error rates: opposite from the
facilitation in reaction times, training with homogeneous groupings may induce higher error
rates than training with heterogeneous groupings. These opposite patterns can be explained if
error rate effects occur at the form selection stage, when similar forms compete for attention,
rather than at the mapping stage, when implicit primes should help speakers prepare their
productions. Meyer (1991) points out that this hypothesis also explains the independent
observation that error rate effects are found even when targets overlap only in later parts of
the word, while response latency effects only appear when targets overlap from the beginning
of the word.
Experiments also allow for estimates of the actual duration of the various stages. The
most basic calculation is of the production latency as a whole (i.e. the duration between
presentation of the visual prompt and initiation of articulation). Levelt, et al. (1998) point out
that 600 msec is a slight overestimate for picture-naming times (their own mean production
latency was 538 msec), but it seems quite accurate for response times in implicit priming
experiments, regardless of language. Thus the mean response times for Dutch words reported
in Meyer (1990, 1991) and for Chinese words reported in Chen, et al. (2002) were all around
600 msec. Task differences have a larger effect on overall response times. For example, mean
production latencies for the Chinese words read aloud in the masked priming experiments in
Chen, et al. (2003) were all below 500 msec. Presumably such differences in overall reaction
7
time across task are due to different durations for what we label stage 1, i.e. all the processes
that precede access of phonological form. The duration of these pre-phonological processes
can be estimated from various behavioral and neurological measures (see Levelt, et al. 1998;
Levelt and Indefrey 2000).
To estimate the combined duration for stages 2 and 3, Roelofs (1997) reviewed a
number of experiments using the picture/word interference paradigm, described earlier. Based
on a computational model of a variety of such studies, Roelofs (1997) estimated 265 msec as
the time from selection of the word to accessing the syllable, a duration that includes all of
stage 2 and part of stage 3. The estimate in Levelt, et al. (1998) for the duration of stage 2
alone comes from Wheeldon and Levelt (1995), who asked native Dutch speakers fluent in
English to decide if the Dutch translation of a word presented in English contained a given
phoneme; this task thus required speakers to encode phonological forms of words without
actually producing them. The response times for detecting word-initial phonemes in
disyllabic words were approximately 125 msec faster than for word-final phonemes. Levelt,
et al. (1998) then estimated the 200 msec of stage 3 by subtracting the cumulative time of the
previous stages from the total production latency. These estimates were found to be consistent
with the time course of brain activation patterns in both a magnetoencephelograph (MEG)
study (Levelt, et al. 1998) and a meta-analysis of many other brain imaging studies (Levelt
and Indefrey 2000).
For reasons that will become clear when we describe our own experiment, it is important
to note that in all of these speech production experiments, production latency is measured by
means of a voice key, that is, a microphone attached to a computer that triggers a signal the
instant any sound is made. Thus what is measured is indeed the total duration of all three
mental stages, up to the point when speech physically begins. Unfortunately, this means that
it is difficult to separate out the effects that are due to stage 2 from those that are due to stage
3. The only methodology described in the literature that is apparently capable of examining
stage 2 separately from stage 3 is the cross-linguistic phoneme detection task of Wheeldon
and Levelt (1995), a task of limited usefulness due to its reliance on fluent bilinguals with
phonemic awareness developed from familiarity with an alphabetic orthography.
Moreover, it is also important to keep in mind that the production model itself is
continually undergoing refinement. For example, by going beyond response time measures
and including electrophysiologically measured brain activation patterns as well, Abdel
Rahman, et al. (2003) have argued for a certain amount of parallel processing in word
production; stages do not always follow each other in strictly serial fashion. Their specific
findings have little direct effect on the time course estimates given above, however, since
their evidence suggests only that semantic feature retrieval may continue even after
phonological processing has begun; the ordering of morphosyntactic feature (lemma)
retrieval prior to phonology is still “serial discrete” in their model (p. 858), and their results
say nothing about parallelism within phonological processing itself.
Despite such caveats, the model presented above is by far the most explicit and welltested available, certainly in the study of word production, if not in the study of phonological
processing in general. There is no obvious reason why it should not apply to sign languages
as well as spoken languages. If so, sign language production should not only also involve
activation of phonological units, but it should also show separate stages of phonological
processing that parallel both the order and durations of those found with spoken language.
3.2 Phonological production in sign language
Research on word production in sign languages is naturally far more limited than in
spoken languages. To date most of what we know comes from language errors, in studies on
8
ASL (Klima and Bellugi 1979; Newkirk, et al. 1980; Whittemore 1987) and German Sign
Language (Hohenberger, et al. 2002). In addition, Corina and Hildebrandt (2002) describe a
series of experiments on phonological processing in ASL, including a picture/word
interference production task. Here we briefly summarize the major findings and relate them
to the production model described above.
As with spoken languages, phonological errors in sign languages operate rather
independently from nonphonological errors at the morphemic or syntactic levels (e.g.
morpheme or word substitutions), suggesting that the division between stages 1 and 2 is valid
for sign languages as well (for linguistic evidence supportive of the same point, see Padden
and Perlmutter 1987). Phonological errors themselves treat the various parameters of sign
form as independent units, resulting, for example, in perseverations, anticipations, or
exchanges of handshape without altering location, movement, orientation or nonmanual
features. Moreover, like speech errors, slips of the hand almost never violate constraints of
the phonological system; Klima and Bellugi (1979), for example, found that only five of the
131 errors in their corpus contained “extrasystemic” gestures. This suggests that, like spoken
language, sign language production involves both stage 2 (encoding of phonological forms)
and stage 3 (preparation of phonetic forms, adjusted to fit the phonological system if errors
are made at stage 2).
In fact, in phonological errors there seems to be only one major difference between sign
and spoken languages (aside from the modality of the units involved). As noted by
Hohenberger, et al. (2002) in a study of language errors in German Sign Language, signers
are far less likely than users of spoken languages to produce exchange errors, where two units
switch location (as in the classic spoonerism sew you to a sheet). However, as these
researchers demonstrate, this isn’t due to deep differences in processing but rather only to a
very superficial effect of modality: the slower speed of the hands relative to oral articulators
gives signers more time to catch and correct errors before the complete exchange can be
produced, causing them to be realized as anticipations (analogous to sew you to a seat).
Interestingly, they also emphasize that this conclusion is consistent with another aspect of
Levelt’s production model not mentioned earlier. This is the self-monitoring process,
whereby language producers monitor the output of stage 2 and/or stage 3 before actual
articulation begins so that they can block the articulation of erroneous forms. Hohenberger, et
al. (2002:138) therefore conclude that “signed and spoken language production is, in
principle, the same.”
When we turn to the production experiment described in Corina and Hildebrandt (2002),
however, the picture at first seems to be somewhat more complex. In this experiment, native
ASL signers and native English speakers participated in parallel picture/word interference
tasks. The English task worked precisely the same way as Schriefers, et al. (1990) (except
that only one timing condition was used, with simultaneous presentation of picture and
auditory interference word). In the ASL task, participants simultaneously saw the picture
whose name was to be signed overlapped with a semi-transparent video image of a signer
producing the interference word. In both tasks the interference word was semantically related,
phonologically related, or unrelated to the target word. The results showed that for both
groups of participants, semantically related targets slowed responses (according to Levelt’s
model, this is due to competition during word selection in stage 1). However, while the
English participants showed very strong facilitation of production latencies from the
phonologically related words (i.e. explicit phonological priming), the ASL participants
showed no effect at all relative to the unrelated controls.
Null results are notoriously difficult to interpret, but Corina and Hildebrandt (2002)
report a similar lack of strong phonological effects for other experiments on phonological
processing in ASL. Thus, phonologically related primes had only a weak effect on response
9
times for word recognition in a lexical decision task, and in a handshape monitoring task,
native ASL signers failed to perform much better than late learners. An off-line phonological
similarity judgment task (described more fully in Hildebrandt and Corina 2002) even failed to
find major differences in performance between native ASL signers and hearing participants
with no ASL experience.
While all of these experiments imply that phonological form is relevant to the
processing of sign language, Corina and Hildebrandt (2002) themselves interpret the results
cautiously, commenting that “the behavioral effects of some phonological form-based
properties are difficult to establish” (p. 108). They speculate that the visual salience of
phonological articulation in sign languages eliminates the need for the complex mental
machinery that users of a spoken language require in order to reconstruct articulations from
acoustic waveforms (according to the Motor Theory of Speech Perception; Liberman 1996).
Thus, they argue, mental representations for phonological units and processes simply do not
become as active in the minds of signers as in the minds of speakers of spoken languages.
It may be that this interpretation overly cautious, however. Form-based priming may
indeed be weak in sign perception and recognition tasks for the reasons that Corina and
Hildebrandt (2002) suggest.5 Even granting this, it is not clear why production should be as
affected by visual salience as perception may be. Producers of signs are not trying to
reconstruct articulations from perceived forms, but to articulate them in actual fact. Moreover,
whether or not Levelt’s model is adopted, sign production must involve access of forms from
memory, a process that would be greatly simplified if the signs were treated as combinations
of a small set of reusable units. Indeed, as we have just seen, some of the best evidence for
the psychological reality of phonological units in sign language comes from production data,
in particular slips of the hand. The null result of Corina and Hildebrandt’s picture/word
interference study could be due to any number of factors unrelated to the role of phonological
form in production itself. Perhaps the participants were visually confused by the overlapping
images, or perhaps the task attempted to probe for phonological processing before signers had
actually reached stage 2.
Another factor to consider when pondering the results of reaction time experiments on
sign language production (or indeed on any topic) is the method by which the reaction times
were collected. This may seem trivial, but its relevance becomes clear as soon as one thinks
about the experiments in the context of a specific processing model, such as Levelt’s. The
method for the picture/word interference experiment is not described in Corina and
Hildebrandt (2002), but according to David Corina (p.c., September 26, 2002), it involved the
use of an infrared trip beam to signal the instant when participants raised their hands to begin
signing. The timing of this same event can also be measured by the keyboard lift-off method,
which Corina has successfully used in a lexical decision experiment on Spanish Sign
Language. In this method (which, unlike the trip beam method, requires no special
equipment), the experimental participant begins by resting his or her hands on a key on a
computer keyboard (e.g. the space bar). When he or she receives a visual prompt on the
computer screen, the hands are lifted and the computer records the time between the onset of
the visual prompt and the release of the key press.
Crucially, note that either method records the timing of a very different event from that
recorded by the voice-key method used in spoken language experiments. For speakers what is
measured is the instant when sound is produced by their mouths, which occurs at the end of
stage 3. By contrast, for signers what is measured is the instant when they have decided that
they know enough about the phonological form to begin signing, which is certainly well
5
However, see Moy (1990) for further experimental evidence of the psychological reality of phonological form
in sign processing.
10
before the end of stage 3. In fact, it is likely to be soon after initial contact is made with
phonological forms accessed from the lexicon in stage 2a. Since large articulators like the
arms are so slow compared to oral articulators, there is plenty of time for stages 2b and 3 to
be mentally prepared as the hands are being lifted into signing position. Therefore,
differences in results for experiments on spoken vs. sign languages may not be due to deep
differences in phonological processing at all, but rather differences in the stage of
phonological processing that is probed by the voice-key method vs. the trip-beam or lift-off
methods. This hypothesis will be explored more fully later.
4. An implicit priming experiment on TSL
The goals of this experiment were threefold. First and most fundamentally, we simply
wanted to know whether it was possible to perform an implicit priming task on a sign
language, since it apparently it had never been tried. As we saw above, there are virtually no
psycholinguistic studies on sign languages that have used reaction-time measures at all.
Second, we wished to test the suggestion made by Corina and Hildebrandt (2002) that
phonological form does not play an important role in the on-line processing of sign language,
a suggestion made partly on the basis of a picture/word interference task conducted on ASL.
Our experiment was intended to provided data from a new language (TSL) using a new
production task (implicit priming). This task was chosen not only for its relative simplicity
compared to the picture/word interference task, but also because, as explained in section 3.1,
it is in principle capable of providing independent information on two stages of production:
access (stage 2a) and mapping (stage 2b) of phonological forms. Finally, the experiment was
planned as the first of a series examining the time course of phonological encoding in sign
production. If this first experiment was successful, in the future we hoped to apply the
implicit priming paradigm again, this time using materials that would allow us to test whether
phonological units are mapped left to right in sign language. Among other things,
determining this should be able to shed light on the phonological nature of handshape change.
4.1 Methods
We followed the procedures for the implicit priming task described in Meyer (1990,
1991) as closely as possible.
4.1.1 Participants
Twenty deaf, fluent TSL signers were paid to participate in this experiment. Nine were
female, eleven male, and their ages ranged from 14 to 59 years old (average about 40 years
old). All used TSL as their primary language, though all were also able to read and write
Mandarin Chinese. Five were also able to speak and lip-read some Mandarin, one some
Southern Min, and one a little of both. Only four signers could be classified as “native”
according to the strict criterion used by Hildebrandt and Corina (2002) (i.e. they acquired
TSL from deaf parents), but the rest were exposed to TSL before the onset of puberty. Thus
the age of TSL acquisition ranged only up to 11 years old, with the average being 7 years old.
An additional seven TSL signers (including three who learned TSL when already older than
10 years old) were paid to participate in a pilot using dummy materials to test the procedure
and reaction time measures, and their results were not analyzed.
11
4.1.2 Materials, design, and procedure
As described earlier, we followed Meyer (1990, 1991) in choosing our materials so that
they could be arranged in two ways, either in groupings of words that were phonologically
similar, or in groupings of words whose phonological forms shared nothing in common. In
this particular experiment, similarity involved sharing the same handshape change, while
location, orientation, movement path, and nonmanual features were allowed to vary. We
settled on the nine one-handed signs shown earlier in Table 2 (see also Appendix B).
In order to trigger the production of these target items, each was associated with a cue
word or phrase, presented visually in Chinese. These cues, with their associated targets, are
listed in Table 4 below. The associations were designed merely to assist memorization of the
otherwise arbitrary cue-target pairs; the nature of the association was not an experimental
variable.
Table 4.
Cues (Chinese)
情人節 (Valentine’s Day)
熱 (hot)
手機 (cell phone)
第一名 (no. 1)
貢糖 (candy)
起床 (get out of bed)
討厭 (annoying)
科技 (technology)
殺人 (murder)
Targets (TSL)
花 FLOWER
太陽 SUN
新 NEW
聰明 SMART
豆 BEAN
醒 WAKE UP
不屑 NO BIG DEAL
發明 INVENT
從來沒有 NEVER BEFORE
Note that unlike most experimental paradigms used in lexical research, lexical frequency
is ignored in the design of implicit priming experiments, other than ensuring that cues and
targets are familiar to all participants (see Meyer 1990). This is partly because response times
for individual items depend not only on characteristics of the target, but also on the
characteristics of the cues and how they relate to the targets. Since any given target is always
preceded by the same cue, there is no way to separate out these effects; they are inherently
confounded. This is not a problem, however, since the crucial comparison in this paradigm
relates to the effect of context, that is, homogeneous vs. heterogeneous groupings. Thus each
item acts as its own control: the only difference between a homogeneous vs. heterogeneous
trial is the training context.
Cue-target pairs were trained in either homogeneous groupings (i.e. the horizontal
groupings in Table 2) or heterogeneous groupings (i.e. the vertical groupings in Table 2).
Specifically, participants were told in TSL (by a hearing but fluent-signing experimenter)
which target word should be produced for each written cue. During each of these training
phases, the participant practiced until he or she was able to produce the expected target
reliably.
After a grouping of cue-target pairs was trained, each participant was presented with a
block of nine trials in which production latencies were measured. The block contained the
three cues that had just been trained, each repeated three times, with all trials presented in
random order but adjusted so that no item appeared two times in a row. The reaction-time
phases of the experiment were run on a laptop computer (PC clone running Windows Me),
12
with experimental control handled by E-Prime 1.0 (Schneider, et al. 2002). Production
latency was measured using the keyboard lift-off method. To begin each trial, participants
were asked to place their dominant hand on the keyboard, with their index finger depressing
the space bar. The trial then began with the display of the symbol + on the center of the
screen, merely to orient the eyes to the correct location, which after one second was replaced
by a cue word or phrase. Participants then had to lift his or her hand and begin signing the
correct target word as quickly and as accurately as possible, without any hesitation. The
computer recorded the time between the onset of the display of the cue word and the release
of the space key (i.e. when the hand was lifted). The fluent-signing experimenters then
immediately coded responses into four categories: correct, wrong word choice, hands
hesitating on keyboard, and hands hesitating in the air after leaving the keyboard.
After each block was completed, the participant would then receive training in another
grouping of three cue-target pairs, followed by the relevant cue-production trials on the
computer, and so forth until three repetitions of each block were completed. The order of
blocks was randomized, with homogeneous and heterogeneous blocks mixed together. The
primary purpose of all this repetition (an inherent aspect of the implicit priming paradigm)
was to increase the total number of trials so that statistical analysis was possible. Each item
appeared equally often in homogeneous groupings as in heterogeneous groupings. Thus each
item appeared 18 times during the course of the experiment (2 grouping conditions × 3
repetitions of blocks × 3 repetitions of items within each block), with a total of 162 trials (18
× 9 cue-target pairs).
Participants were arbitrarily assigned to two equal-sized groups, defined by whether or
not the first block they were exposed to was a homogeneous or heterogeneous block
(following standard procedures, this was done in case the first training experience colored the
participant’s behavior throughout the rest of the experiment). Each participant required 30 to
40 minutes to complete the experiment.
4.2 Results
We analyzed two aspects of the responses: response times (production latencies) and
error rates. Errors consisted of all responses coded as errors by the experimenters during the
experiment (i.e. wrong word choices and hesitations), plus responses with latencies of one
second or longer (the same criterion used by Meyer 1990). To prepare response time (RT) for
analysis, we grouped responses by condition (heterogeneous, homogeneous), and within
these, by set (Set 1, Set 2, Set 3), and within these, by repetition of blocks (repetition within
blocks was not separated out for analysis). We then calculated the average RT for each
combination of condition, set, and repetition. Note that “set” here refers to the set of items in
the design (i.e. the items appearing horizontally in Table 2), not necessarily the grouping of
items that appeared within a block during the experiment. Thus the words that appeared in the
analysis labeled “heterogeneous set 1” were the same as those in “homogeneous set 1”. The
only difference was only that heterogeneous set 1 consisted of responses to words in Set 1
when they were trained and tested along with words of different phonological types (e.g.
FLOWER when trained with SMART and NO BIG DEAL), while homogeneous set 1
consisted of responses to words in Set 1 when trained and tested with words sharing
handshape change (e.g. FLOWER when trained with SUN and NEW). To prepare error rates
for analysis, we calculated the proportion of errors (as defined above) within each
combination of condition, set, and repetition.
As required by the design, statistical analyzes for both RT and error rates were
conducted within participants, but we also noted what order group (heterogeneous first,
homogeneous first) that each participant belonged to and included this as a between13
participant variable. We then performed separate four-way ANOVAs on RT and error rates
(order × condition × set × repetition).6 Theoretical interest lies primarily in any main effect
and interaction involving the conditions and the sets (order and repetition were only included
to understand what role, if any, practice had on the responses over the course of the
experiment).
The effects of condition and set on response time are illustrated in Figure 1 below; the
same information is given in Table 5, along with standard errors.
RT (msec)
Figure 1. The effects of condition and set on response time.
450
440
430
420
410
400
390
380
370
360
350
Set 1
Set 2
Set 3
Heterogeneous
Homogeneous
Table 5. Means in msec (and standard errors) for reaction times.
Heterogeneous
Set 1 415 (11.3)
Set 2 409 (11.5)
Set 3 409 (10.9)
Homogeneous
406 (11.8)
413 (12.6)
427 (13.0)
As hinted at by the large standard errors relative to the differences in RT, there was no
main effect of condition; mean response times for the heterogeneous condition (411 msec)
and homogeneous condition (415 msec) were not significantly different at the 0.05 level
(F(1,18) = 0.4, p = 0.53). There was also no main effect of set; mean response times for Set 1
(410 msec), Set 2 (411 msec) and Set 3 (418 msec) were not significantly different (F(2,36) =
1.13, p = 0.33). However, there was a significant interaction between condition and set
(F(2,36) = 4.5, p = 0.02), which is reflected in the different pattern of bar lengths in the left
versus the right side of Figure 1. In particular, it appears that in the heterogeneous condition,
there was very little difference in response times across the sets, while in the homogeneous
condition, differences were much more pronounced, with Set 1 the fastest and Set 3 the
slowest. No other effects or interactions were significant (all ps > 0.3), implying that there
were no effects of practice on RT over the course of the experiment. The lack of a main effect
of condition was apparently not due to the influence of a few recalcitrant items, since as
shown in Table 6, about half of the items showed longer RTs in the heterogeneous condition,
while half showed the opposite tendency.
6
One data point was missing in the RT analysis (one participant, in one condition, set, and repetition, made only
errors, leaving no mean RT). We estimated this missing value following Winer (1971:488-9).
14
Table 6. Mean RTs (msec) by item.
花 FLOWER
太陽 SUN
新 NEW
聰明 SMART
豆 BEAN
醒 WAKE UP
不屑 NO BIG DEAL
發明 INVENT
從來沒有 NEVER BEFORE
Heterogeneous
420
404
414
420
402
408
420
416
404
Homogeneous
417
395
411
404
403
418
420
427
433
Difference
3
9
3
16
-1
-10
0
-11
-29
The effects of condition and set on error rates are illustrated in Figure 2 below; means
and standard errors are given in Table 7.
Figure 2. The effects of condition and set on error rates.
Error rate (%)
10
8
Set 1
Set 2
Set 3
6
4
2
0
Heterogeneous
Homogeneous
Table 7. Means (and standard errors) for error rates.
Heterogeneous
Set 1 2.6% (0.8)
Set 2 5.4% (1.2)
Set 3 5.7% (1.1)
Homogeneous
4.1% (1.1)
6.5% (1.2)
9.4% (2.0)
This time there was a main effect of condition, quite a large effect in fact. The mean
error rate for the heterogeneous condition (4.6%) was significantly lower than for the
homogeneous condition (6.7%) (F(1,18) = 9.13, p = 0.007). As shown in Table 8, this pattern
was consistent, being found in six out of the nine items (with only one item showing the
opposite).
15
Table 8. Mean error rates by item.
花 FLOWER
太陽 SUN
新 NEW
聰明 SMART
豆 BEAN
醒 WAKE UP
不屑 NO BIG DEAL
發明 INVENT
從來沒有 NEVER BEFORE
Heterogeneous
2.2%
2.2%
3.3%
7.8%
3.3%
3.3%
6.1%
8.9%
3.9%
Homogeneous
5.6%
1.7%
5.0%
7.8%
5.0%
6.7%
13.9%
8.9%
5.6%
Difference
-3.4%
0.5%
-1.7%
0.0%
-1.7%
-3.4%
-7.8%
0.0%
-1.7%
There was also a main effect of set, with the mean error rate for Set 1 (3.3%) lower than
that for Set 2 (5.9%), which was in turn lower than that for Set 3 (7.6%) (F(2,36) = 4.0, p =
0.03). However, there was no significant interaction between condition and set (F(2,36) =
0.75, p = 0.48); unlike the case with the response times, the pattern of increasing error rates
from Set 1 to Set 3 was basically the same in both conditions. There were also two significant
effects relating to repetition. First, there was a main effect of repetition (F(2,36) = 27.6, p <
0.0001), with the error rate for repetition 1 (more accurately, the first presentation of the
materials) being higher (9.8%) than for repetition 2 (4.4%), which was higher than for
repetition 3 (2.7%). This merely shows an effect of practice on reducing error rates.
Somewhat more interesting was a significant interaction between repetition and set (F(4,72)
= 5.49, p = 0.0006). This interaction is illustrated in Figure 3, where it can be seen that the
difference across sets was mainly found in repetition 1, when participants had their first
contact with the materials. After some practice with them, this effect disappeared.
Error rate (%)
Figure 3. The effects of repetition and set on error rates.
16
14
12
10
8
6
4
2
0
Set 1
Set 2
Set 3
Repetition 1
Repetition 2
Repetition 3
Error rates also showed a nearly significant interaction between condition and order
(F(1,18) = 3.82, p = 0.066), since participants who received a homogeneous set first tended to
show a larger difference in error rates between the two conditions than did participants who
received a heterogeneous set first. No other effects were significant (all ps > 0.35), in
particular the interaction between condition and repetition: while practice reduced overall
error rates and reduced error rate differences across sets, it had no effect on reducing the
16
different patterns of responses to homogeneous versus heterogeneous conditions.
One final issue that should be mentioned before we move into the discussion is the
possible role of age of acquisition in the results. Studies (e.g. Mayberry and Fischer 1989;
Hildebrandt and Corina 2002) have found differences in the performance of signers born to
deaf signers versus those born to hearing parents (who are thus typically not exposed to a sign
language until they enter school) in how they perceive phonological forms. Though all of the
signers in our experiment acquired TSL prior to puberty, only four of them were, strictly
speaking, native signers (i.e. born to deaf signers). Nevertheless, when we looked for
evidence that native competence played any role in our results, no such evidence was found:
in new ANOVAs for RT and error rates that included native competence as a betweenparticipant factor, this factor showed no main effect and did not interact with any other factor.
5. Discussion
If nothing else, the experiment fulfilled its first goal: we demonstrated that it is possible
to run an implicit priming task on a sign language and obtain meaningful results. The most
important of these results related to the effect of condition: in both response times and error
rates, we found that the difference between heterogeneous and homogeneous conditions had
an effect. In response times, this effect was indirect, being found only in a differential
patterning across the sets in the two conditions. In error rates, the effect was quite robust,
with items in the homogeneous condition being produced with higher error rates (i.e.
hesitations both before and after lifting the hands from the keyboard, and production of the
wrong word). Thus our experiment has provided evidence for form-based effects on the
production of signs. Nevertheless, as with Corina and Hildebrandt’s (2002) experiment, we
failed to find a main effect of reaction time, with our most robust effects appearing instead in
error rates.
Before discussing how this pattern of results should be interpreted, we first examine a
factor that did have effects on both reaction times and error rates: set. As shown by error rates
(in both conditions) and response times (only in the homogeneous condition), items in Set 1
([ZERO] > [SAME] signs) seemed to be easier (lower error rates, faster response times) than
items in Set 3 ([RENT] > [SAME] signs), with items in Set 2 ([LÜ] > [SIX] signs) falling in
between. It is important to resist the temptation to interpret these differences as necessary
consequences of the differences in phonological forms of the words in these sets, since the
sets also differed in at least three other ways: the lexical frequency or familiarity of the target
forms, the lexical frequency or familiarity of the Chinese cues used to prompt the signers, and
the associative relations between the cues and the targets. Of these factors, the only two about
which we have concrete information are the lexical frequency or familiarity of the cues and
targets. Since the prompts were Chinese words or phrases, we can look up their frequencies
in a large corpus. In our case we ran searches for them on www.google.com (see Blair, et al.
2002 for evidence that Internet search engines provide reliable frequency estimates). The
results are shown in Table 9.
17
Table 9. Estimated frequencies of Chinese cues (www.google.com, 9:30 am 2/21/2003)
Set 1
Frequency
Set 2
Frequency
Set 3
Frequency
Average
情人節
熱
手機
(Valentine’s Day)
(hot)
(cell phone)
119,000
2,330,000
2,240,000
1,563,000
Average
第一名
貢糖
起床
(no. 1)
(candy)
(get out of bed)
290,000
1,660
632,000
307,887
Average
討厭
科技
殺人
(annoying)
(technology)
(murder)
299,000
9,670,000
640,000
3,536,333
Although the average for Set 3 ends up being the highest, this is due solely to the
unnaturally high frequency of the word meaning “technology”, likely reflecting the bias of
webmasters more than anything else. Removing this item gives Set 1 cues the highest average
frequency and makes the frequencies for Sets 2 and 3 roughly comparable.
We also know something about the lexical frequency or familiarity of the TSL target
signs themselves. The TSL textbook series by Smith and Ting (1979, 1984), like any good
language textbook, introduces vocabulary in a sequence judged to be the most useful. Thus
the division of vocabulary across the two volumes can be taken as a reasonable estimate of
vocabulary usefulness, and hence of frequency and familiarity. Applying this to the current
experimental materials, we observe that all three of the items in Set 1 are introduced in
Volume 1, all three of the items in Set 3 are introduced in Volume 2, and the items in Set 2 are
mixed (two are introduced in Volume 1, and one is introduced in Volume 2).
Thus the lower error rates for Set 1 are associated not only with more familiar Chinese
cue words, but also more familiar TSL targets, while the higher error rates for Set 3 are
associated with a lower degree of familiarity in both Chinese prompts and TSL targets.
Another clue that differences in error rates across the sets were due to frequency effects rather
than phonology comes from the interaction between repetition and set on error rates,
illustrated earlier in Figure 3. This interaction shows that the set difference effect was solely
due to participants’ first exposure to the items. This is what one would expect from frequency
effects, which can be counteracted by repeated exposure. By contrast, the error rate difference
between the homogeneous and heterogeneous conditions did not wane during the course of
the experiment.
Again we must clarify that in contrast to most experimental paradigms used in research
on word processing, frequency effects themselves are not really relevant here, except to show
that, unsurprisingly, nonphonological factors played a role in our experiment. A deeper
analysis of frequency effects is not possible due to the inextricable confounding between cue
and target properties, and in any case, such an analysis would tell us less than one might
expect. For example, it may seem, as David Corina (p.c., May 2, 2004) has suggested, that
frequency effects, or the lack thereof, could be relevant in determining the processing stage
probed in our experiment: only lexical stages of processing should show such effect.
However, in Levelt’s model all stages are lexical to some degree: even the phonetic encoding
stage involves retrieval from a lexical syllabary of stored articulatory gesture programs
(Levelt and Wheeldon 1994; Cholin, et al. 2004). Thus frequency effects should be
ubiquitous in any appropriately designed experiment.
Although nonphonological differences across sets are likely to be the primary factors
causing the different error rates, we should also briefly consider the possible influence of the
degree of homogeneity within each set. Recall that when we introduced the materials we
18
noted that the target items in Set 2 do not seem to be fully phonologically homogeneous:
SMART and BEAN begin with finger restrained contact, while WAKE UP begins with thumb
pad contact. The set may thus be an example of an “odd-man-out” set (in the terms of Cholin,
et al. 2004) and should therefore be expected to show weaker effects than the other two sets,
which were fully homogeneous. Though there was no significant interaction between set and
condition in error rates, there was indeed a trend in precisely this direction: as can be seen
from Table 7 above, Set 2 showed a smaller difference in error rate between the homogeneous
and heterogeneous conditions (1.1%) than either Set 1 (1.5%) or Set 3 (3.7%). The lack of
significance and the possible influences of nonphonological factors here mean that we should
take this observation with a great deal of caution, but it may be worth following up in future
studies.
We now turn to a discussion of the most important finding of the experiment: robust
error rate effects without main effects of reaction time. As noted earlier, higher error rates in
homogeneous contexts are also commonly found in implicit priming experiments performed
on spoken languages, and, beginning with Meyer (1991), this has been taken to suggest that
error rate effects occur at stage 2a, when phonological forms are first being accessed from
memory. Yet unlike most implicit priming experiments conducted on spoken languages, we
failed to find differences in overall reaction time across the two grouping conditions, thus
missing the effect that has been claimed to occur at stage 2b, when phonological units are
being mapped into prosodic structure. Here we consider two factors that may have affected
our results, viewed within the framework of Levelt’s production model.
The first factor is the phonological structure of our experimental materials. It is possible
that the phonological forms in our homogeneous groupings, while indeed similar, were not
similar “from left to right”. That is, although they shared phonological elements (the
handshape change, including the first handshape in the change), they differed in other
parameters at the beginning of the sign (in particular, location and orientation). Thus the set
of phonological features linked to the initial timing slot in the prosodic structure (i.e. the first
“segment”) would not have been identical across the items even within a homogeneous
grouping. Roelofs (1999) has shown that in Dutch, mere featural similarity in onsets (e.g. /b/
and /p/) was not enough to trigger the implicit priming effect; onset segments had to be
identical in all features. Similarly, Chen, et al. (2002) found that when Chinese syllables
matched only in tone, which like handshape change is distributed across the entire syllable,
there was no standard implicit priming effect either. It is thus possible that phonological
differences between the signs we used and the forms used in most spoken language implicit
priming experiments may have led them to be processed in different ways.
The phonological structure of the materials is certainly an important factor to keep in
mind for future experiments. However, we believe that a second, methodological factor may
have had a much greater influence in creating the pattern of our results. Namely, our use of
the lift-off method to measure response times may mean that we tapped into an earlier stage
of word production processing than the voice-key method used for spoken languages.
According to the argument sketched in section 3.2, the signers in our experiment must have
lifted their hands after achieving initial access of phonological forms at stage 2a, before the
mapping of stage 2b could even begin. This hypothesis would immediately explain the
significantly higher error rate in the homogeneous condition (due to processing at stage 2a)
and the lack of RT differences (missed since stage 2b had not yet been reached).
A key further prediction of this hypothesis is that the overall response time in our
experiment, missing as it did stages 2b and 3, should be quite a bit faster than those observed
in spoken language studies. In fact, with the durations of these stages estimated as in Table 3,
we can make this prediction quantitatively precise: our overall reaction time should be around
200 msec faster (an overestimation for the duration of stage 3, i.e. stage 3 plus a bit of stage
19
2). Recall that in spoken languages, production latencies in implicit priming experiments are
around 600 msec, a value that is consistent across word length, language, and size of the cuetarget sets (see e.g. Meyer 1990, 1991 for Dutch; Chen, et al. 2002 for Chinese). By contrast,
as can be seen from Table 5 above, our average response times for TSL were just a little over
400 msec. More precisely, the mean RT over all 360 data points used in the RT analysis (2
conditions × 3 sets × 3 repetitions × 20 participants) was 413 msec (standard deviation 92
msec). Our lower overall RT cannot be due to how we eliminated erroneous responses, since
we followed standard methods here as well (e.g. Meyer 1990 also rejected RT values over
one second). The difference also cannot be ascribed to differences in manual vs. oral
articulation, since in both modalities what is measured in these experiments is the time before
articulation actually begins, and in any case manual articulation is slower, yet our response
times were faster. These observations suggest that not only were our signers lifting their
hands from the keyboard prior to stage 2b, but that the durations for the preceding and
following stages were approximately the same as those deduced for the production of spoken
languages.
Yet another argument for these conclusions comes from the different RT patterns across
sets in the homogeneous vs. heterogeneous conditions. Recall that in the heterogeneous
condition, RT values were quite close across Set 1, Set 2, and Set 3, while in the
homogeneous condition, Set 1 was fastest while Set 3 was slowest, consistent with the
difference in error rates (highest in Set 3 and lowest in Set 1). As argued earlier in this section,
the cross-set differences were likely due to frequency effects, not phonological properties.
Now, Meyer (1991) noted that the effects due to what we call stage 2a were not only different
from those due to stage 2b, but were also more sensitive to varying aspects of the materials,
such as “the strength of the associations between prompts and response words, the relative
frequencies of the words, and the semantic relations among them”, so that stage 2a effects
could arise “only if several of these factors conspired in making the selection of the response
words particularly difficult” (p. 85). Applying this view to our own experiment, we expect to
find response time differences across sets to show up more strongly in the homogeneous
condition, when items are competing phonologically, since this competition would add to the
“conspiracy” that makes word form selection (stage 2a) sufficiently difficult to affect
responses. This is in fact just what the interaction between condition and set seems to show.
Summarizing, then, where our results differ from those found with spoken languages, it
is apparently primarily because a difference in methodology (lift-off vs. voice-key) led to our
probing into an earlier stage of phonological production than the implicit priming
experiments that have been conducted on spoken languages. The production process itself
seems to be identical across modalities, even down to the detailed time course of the stages.
6. Conclusions
Intensive linguistic research over the past few decades has established beyond any
reasonable doubt that sign languages employ phonological systems quite comparable, in both
function and form, with spoken languages. In language production in particular, the evidence
is quite strong that phonological units like handshapes are manipulated mentally by signers
just as producers of spoken languages manipulate phonemes and features. The null results of
the ASL production experiment reported in Corina and Hildebrandt (2002) are the sole
anomaly in the previous literature, but like all null results, they merely provide a call for
further research.
The experiment described in this paper not only provides further evidence for the mental
processing of handshape in the production of signs, but also suggests a possible reason for the
null results in Corina and Hildebrandt’s experiment: the measure of reaction time they used
20
may have tapped into an earlier stage of processing, before all aspects of phonological form
were fully fleshed out in signer’s minds. This hypothesis is supported by a variety of
arguments, including the precise duration of the reaction times. Too often psycholinguists
analyze reaction times merely to find out if they are different across conditions, without
considering the information provided by the absolute values themselves. After all, reaction
times represent the duration of real processes occurring in real time. The analyses presented
in this paper demonstrate that an understanding of precisely what is being measured in an
experiment can be crucial. In this case, they suggest that sign language processing shares
quite deep similarities with spoken language processing, even down to the ordering and fine
temporal detail of the stages. The statement of Hohenberger, et al. (2002) declaring the
production of sign and spoken languages to be the same is even more accurate than they may
have realized.
Nevertheless, it must be admitted that something of a methodological challenge is
presented by the discovery that the keyboard lift-off method used for sign language taps into
a different stage from the voice-key method used for spoken language. Namely, if researchers
are interested in the time course of the final stages of sign production, stages apparently
missed by the lift-off method, some other method for measuring reaction times must be
developed. One possibility would be to use high-speed video and then estimate response
times by counting frames. High-speed video would be necessary, since response time
differences across conditions (as estimated from research on spoken languages) are too close
to the limits of temporal resolution of standard video (about 30 msec). Yet using video to
measure response times is not only very labor-intensive, but there are also questions about its
inherent reliability. A voice-key is triggered the instant the mouth begins to make noise, but
how should one objectively define the precise moment when a signer truly begins to
articulate a sign?
Regardless of the method for measuring RT, the implicit priming method may also be
somewhat problematic for further study of handshape change in particular. As noted in
section 2, a primary reason for the interest in handshape change is the question of whether it
is represented as a sequence of handshapes or as a whole. Phonological analyses in the sign
literature typically assume that in some sense handshape changes are composed of separate
(though autosegmentally linked) handshapes (e.g. Sandler 1989; Brentari 1990, 1998; Corina
1990, 1993). The handshape detection experiment described in Corina and Hildebrandt (2002)
provided some psycholinguistic evidence for this view, since no RT difference was found
between detecting a handshape in a sign without handshape change and detecting the first
handshape in a sign with handshape change (detecting the second handshape naturally took
slightly longer, since participants had to wait until the sign neared completion). Yet if we
want to address the question of handshape change composition with an implicit priming
experiment, we face a possible problem with the materials.
Suppose we want to know if production of the sign FLOWER really begins with access
of the individual handshape [ZERO]. The natural thing to do would be to include FLOWER
and ZERO together in the homogeneous training condition. Unfortunately, however, these
signs would have a different number of handshapes and thus possibly different prosodic
structures. Research on spoken languages has shown that the implicit priming effect only
occurs if items in the homogeneous condition are prosodically identical (see e.g. Roelofs and
Meyer 1998). It may seem better, then, to train FLOWER along with another handshapechange sign that also begins with the [ZERO] handshape, but unfortunately this is impossible.
As noted earlier, each of the handshapes that participate in handshape change is typically
predictable from the other. Thus handshape changes that begin with [ZERO] necessarily end
with [SAME], just as in FLOWER. Therefore, further research on the production of
handshape change seems to require the development of methods beyond those currently
21
described in the literature.
On the other hand (so to speak), what are limitations for our original research questions
may ultimately prove beneficial to the study of phonological production in general. The liftoff method appears to tap into a stage prior to articulatory preparation, perhaps prior even to
the mapping of phonological units into prosodic structure. As far as we are aware, no method
developed for spoken language has this capability. The closest seems to be the method of
Wheeldon and Levelt (1995), yet as noted earlier, this method has limited usefulness for
many languages. The challenge is that there is no physical correlate (short of data that could
only be collected through expensive and time-consuming brain imaging studies) for the
moment when a speaker has accessed the phonological form of a word, but has not yet begun
to flesh it out. The lift-off method, however, appears to provide just such a physical correlate.
Given that all the evidence so far points to the conclusion that language production works
precisely the same way for sign and spoken languages, research on sign language could thus
provide insights into the working of spoken language that would not be available any other
way. This provides yet another argument for the position championed for forty years by sign
language researchers: far from being an exotic novelty, sign languages can actually provide
crucial insights into the human language faculty that would otherwise never be uncovered.
22
References
Abdel Rahman, Rasha, Miranda van Turennout, and Willem J. M. Levelt. 2003. Phonological
encoding is not contingent on semantic feature retrieval: An electrophysiological study
on object naming. Journal of Experimental Psychology: Learning, Memory, and
Cognition 29 (5):850-860.
Armstrong, D., W. Stokoe, and S. Wilcox. 1995. Gesture and the Nature of Language.
Cambridge, UK: Cambridge University Press.
Blair, I. V., G. R. Urland, and J. E. Ma. 2002. Using Internet search engines to estimate word
frequency. Behavior Research Methods, Instruments, and Computers 34 (2): 286-290.
Boersma, Paul. 1998. Functional Phonology. The Hague, Netherlands: Holland Academic
Graphics.
Brentari, Diane. 1990. Licensing in ASL handshape change. Sign Language Research:
Theoretical Issues, ed. by Ceil Lucas, 27-49. Washington: Gallaudet University Press.
Brentari, Diane. 1998. A Prosodic Model of Sign Language Phonology. Cambridge,
Massachusetts: The MIT Press.
Channon, Rachel. 2002. Signs Are Single Segments: Phonological Representations and
Temporal Sequencing in ASL and Other Sign Languages. University of Maryland at
College Park doctoral dissertation.
Chen, J.-Y., T.-M. Chen, and G. Dell. 2002. Word-form encoding in Mandarin Chinese as
assessed by the implicit priming task. Journal of Memory and Language 46: 751-781.
Chen, J.-Y., W.-C. Lin, and L. Ferrand. 2003. Masked priming of the syllable in Mandarin
Chinese speech production. Chinese Journal of Psychology 45:107-120.
Cholin, Joana, Niels O. Schiller, and Willem J. M. Levelt. 2004. The preparation of syllables
in speech production. Journal of Memory and Language 50: 47-61.
Corina, David P. 1990. Handshape assimilations in hierarchical phonological representation.
Sign Language Research: Theoretical Issues, ed. by Ceil Lucas, 27-49. Washington, DC:
Gallaudet University Press.
Corina, David P. 1993. To branch or not to branch: Underspecification in ASL handshape
contours. Current Issues in ASL Phonology, vol. 3: Phonetics and phonology, ed. by
Geoffrey Coulter, 63-95. New York: Academic Press.
Corina, David P. and Ursula C. Hildebrandt. 2002. Psycholinguistic investigations of
phonological structure in ASL. Modality and Structure in Signed and Spoken Languages,
ed. by R. P. Meier, K. Cormier, and D. Quinto-Pozos, 88-111. Cambridge: Cambridge
University Press.
Coulter, Geoffrey R. (ed.) 1993. Current Issues in ASL Phonology. New York: Academic
Press.
Cutler, Anne. (ed.) 1982. Slips of the Tongue and Language Production. The Hague: Mouton.
Fromkin, Victoria A. 1971. The non-anomalous nature of anomalous utterances. Language
47:27-52.
Fromkin, Victoria A. 1973. Speech Errors as Linguistic Evidence. The Hague: Mouton.
Fromkin, Victoria A. 1980. (ed.) Errors in Linguistic Performance: Slips of the Tongue, Ear,
Pen and Hand. New York: Academic Press.
Garrett, M. F. 1980. The limits of accommodation. Errors in Linguistic Performance, ed. by
V. A. Fromkin, 263-271. New York: Academic Press.
Garrett, M. F. 1988. Processes in language production. Linguistics: The Cambridge Survey.
Vol III: Language: Psychological and Biological Aspects, ed. by F. J. Newmeyer, 69-96.
Cambridge: Cambridge University Press.
Hildebrandt, Ursula C., and David P. Corina. 2002. Phonological similarity in American Sign
Language. Language and Cognitive Processes 17: 593-612.
23
Hohenberger, Annette, Daniela Happ, and Helen Leuninger. 2002. Modality-dependent
aspects of sign language production: Evidence from slips of the hands and their repairs
in German Sign Language. Modality and Structure in Signed and Spoken Languages, ed.
by R. P. Meier, K. Cormier, and D. Quinto-Pozos, 112-142. Cambridge: Cambridge
University Press.
Kirchner, Robert. 1997. Contrastiveness and faithfulness. Phonology 14: 83-111.
Klima, Edward S., and Ursula Bellugi. 1979. The Signs of Language. Cambridge, MA:
Harvard University Press.
Levelt, Willem J. M. and Peter Indefrey. 2000. The speaking mind/brain: Where do spoken
words come from? Image, Language, Brain: Papers from the First Mind Articulation
Project Symposium, ed. by Alec Marantz, Yasushi Miyashita, and Wayne O’Neil, 77-93.
MIT Press.
Levelt, Willem J. M., Peter Praamstra, Antje S. Meyer, Paivi Helenius, and Riitta Salmelin.
1998. An MEG study of picture naming. Journal of Cognitive Neuroscience 10: 553567.
Levelt, Willem J. M., Ardi Roelofs, and Antje S. Meyer. 1999. A theory of lexical access in
speech production. Behavioral and Brain Sciences 22: 1-75.
Levelt, Willem J. M., and Linda Wheeldon. 1994. Do speakers have access to a mental
syllabary? Cognition 50: 239-269.
Liberman, Alvin M. 1996. Speech: A Special Code. Cambridge, MA: MIT Press.
Liddell, Scott K. 1990. Structures for representing handshape and local movement at the
phonemic level. Theoretical Issues in Sign Language Research, Vol. 1: Linguistics, ed.
by Susan Fischer and Patricia Siple, 37-65. Chicago: The University of Chicago Press.
Liddell, Scott K., and Robert E. Johnson. 1989. American Sign Language: The phonological
base. Sign Language Studies 64: 195-277.
Lombardi, L. 1990. The nonlinear representation of the affricate. Natural Language and
Linguistic Theory 8:375-425.
Lucas, Ceil, and Clayton Valli. (eds.) 2000. Linguistics of American Sign Language (3rd ed.).
Washington, DC: Gallaudet University Press.
Mayberry, R. I., and S. D. Fischer. 1989. Looking through phonological shape to lexical
meaning: The bottleneck of non-native sign language processing. Memory and
Cognition 17: 740-754.
Meyer, Antje S. 1990. The time course of phonological encoding in language production: The
encoding of successive syllables of a word. Journal of Memory and Language 29: 524545.
Meyer, Antje S. 1991. The time course of phonological encoding in language production:
Phonological encoding inside a syllable. Journal of Memory and Language 30: 69-89.
Meyer, Antje S., and H. Schriefers. 1991. Phonological facilitation in picture-word
interference experiments: Effects of stimulus onset asynchrony and types of interfering
stimuli. Journal of Experimental Psychology: Language, Memory, and Cognition 17:
1146-1160.
Moy, Anthony. 1990. A psycholinguistic approach to categorizing handshapes in American
Sign Language: Is [As] an allophone of /A/? Sign Language Research: Theoretical
Issues, ed. by Ceil Lucas, 346-357. Washington, DC: Gallaudet University Press.
Myers, James and Jane Tsay. 2003. A formal functional model of tone. Language and
Linguistics 4 (1):105-138.
Newkirk, Don, Edward S. Klima, Carlene C. Pedersen, and Ursula Bellugi. 1980. Linguistic
evidence from slips of the hand. Errors in Linguistic Performance: Slips of the Tongue,
Ear, Pen, and Hand, ed. by Victoria A. Fromkin, 165-197. New York: Academic Press.
Padden, Carol A., and David M. Perlmutter. 1987. American Sign Language and the
24
architecture of phonological theory. Natural Language and Linguistic Theory 5(3):335375.
Roelofs, Ardi. 1997. The WEAVER model of word-form encoding in speech production.
Cognition 64: 249-284.
Roelofs, Ardi. 1999. Phonological segments and features as planning units in speech
production. Language and Cognitive Processes 14: 173-200.
Roelofs, A., and A. S. Meyer. 1998. Metrical structure in planning the production of spoken
words. Journal of Experimental Psychology: Learning, Memory, and Cognition 24:922939.
Sandler, Wendy. 1989. Phonological Representation of the Sign: Linearity and Nonlinearity
in American Sign Language. Dordrecht: Foris.
Sandler, Wendy. 1990. Temporal aspects and ASL phonology. Theoretical Issues in Sign
Language Research, ed. by Susan D. Fischer and Patricia Siple, 7-35. Chicago:
University of Chicago Press.
Sandler, Wendy. 2000. One phonology or two? Sign language and phonological theory. The
First Glot International State-of-the-Article Book: The Latest in Linguistics, ed. by Lisa
Chen and Rint Sybesma, 349-383. Berlin: Mouton de Gruyter.
Schneider, W., A. Eschman, A., and A. Zuccolotto. 2002. E-Prime Reference Guide.
Pittsburgh: Psychology Software Tools Inc.
Schriefers, H., A. S. Meyer, and W. J. M. Levelt. 1990. Exploring the time course of lexical
access in language production: picture-word interference studies. Journal of Memory
and Language 29:86-102.
Smith, Wayne. 1989. The Morphological Characteristics of Verbs in Taiwan Sign Language.
Indiana University doctoral dissertation.
Smith, Wayne H. and Li-fen Ting. 1979. Shou Neng Sheng Qiao [Your Hands Can Become a
Bridge], Vol. 1. Taipei: Deaf Sign Language Research Association of the Republic of
China.
Smith, Wayne H. and Li-fen Ting. 1984. Shou Neng Sheng Qiao [Your Hands Can Become a
Bridge], Vol. 2. Taipei: Deaf Sign Language Research Association of the Republic of
China.
Stemberger, J. P. 1983. Speech Errors and Theoretical Phonology: A Review. Bloomington,
Indiana: Indiana University Linguistics Club.
Steriade, Donca. 1993. Closure, release, and nasal contours. Nasality (Phonetics and
Phonology 5), ed. by M. Huffman and R. Krakow, 125-153. New York: Academic Press.
Steriade, Donca. 2000. Paradigm uniformity and the phonetics-phonology boundary. Papers
in Laboratory Phonology V: Acquisition and the Lexicon, ed. by M. B. Broe and J. B.
Pierrehumbert, 313-334. Cambridge, UK: Cambridge University Press.
Stokoe, William C. 1960. Sign language structure: An outline of the visual communication
systems of the American Deaf. Studies in Linguistics, Occasional Papers 8.
Stokoe, William C., Dorothy C. Casterline and Carl G. Croneberg. 1965. A Dictionary of
American Sign Language on Linguistic Principles. Silver Spring: Linstok Press.
Taub, S. (2000). Language and the body: Iconicity and Metaphor in American Sign
Language. Cambridge, UK: Cambridge University Press.
Uyechi, Linda. 1996. The Geometry of Visual Phonology. California: CSLI Publications.
van der Hulst, Harry, and Anne Mills (ed.) 1996. Issues in Sign Linguistics: Phonetics,
Phonology and Morpho-syntax. Lingua 98 (special issue).
Wheeldon, L., and W. J. M. Levelt. 1995. Monitoring the time course of phonological
encoding. Journal of Memory and Language 34: 311-334.
Whittemore, Gregory L. 1987. The Production of ASL Signs. University of Texas at Austin
doctoral dissertation.
25
Winer, B. J. 1971. Statistical Principles in Experimental Design. New York: McGraw-Hill
Book Company.
[Received XXX XXX 2003; revised XXX August 2004; accepted XXX XXX 2004]
James Myers
Graduate Institute of Linguistics
National Chung Cheng University
Minhsiung, Chiayi 621
Taiwan
[email protected]
26
Appendix A: TSL signs containing the indicated handshapes
ZERO ([ZERO])
HAVE ([HAND])
SIX ([SIX])
27
LÜ ([LÜ] on both hands)
PLACE ([SAME])
RENT (handshape changes from [RENT] to [SAME])
28
FAST ([SIX])
CUT CLASS ([ZERO] on dominant hand, [HAND] on nondominant hand)
RICE ([ONE] on dominant hand, [LÜ] on nondominant hand)
29
WRITE ([LÜ] on dominant hand, [HAND] on nondominant hand; dominant hand moves
downward across the nondominant hand as if writing)
STICK (dominant hand changes from [RENT] to [HAND]; nondominant hand remains
[HAND] throughout)
30
Appendix B: The nine signs involving handshape change used as production targets in
the experiment, grouped by the sets that formed the basis of the experimental design.
Set 1: [ZERO] > [SAME]
FLOWER
SUN
NEW
31
Set 2: [LÜ] > [SIX]
SMART
BEAN
WAKE UP
32
Set 3: [RENT] > [SAME]
NO BIG DEAL
INVENT
NEVER BEFORE
33
台灣手語的音韻產生歷程
麥傑、李信賢、蔡素娟
國立中正大學語言學研究所
這篇論文主要在描述一個有關台灣手語在語言產生過程中的手型變化(相當於口語的
音韻變化)的實驗。這個實驗採用的是「隱藏啟動實驗典範」(implicit priming;Meyer
1990, 1991)。實驗結果不僅提供證據,證明音韻形式在手語的產生中扮演一個重要的
角色,而且顯示手語音韻產生的時程與 Levelt, Roelofs, and Meyer 等人(1999)所發
現的口語產生的時程一致。本實驗更進一步凸顯有關音韻產生(包括手語與口語)的
研究方法的某些考量的重要性。
關鍵詞:台灣手語、音韻學、心理語言學
34