BLACKWELL, ET AL.
The Time Course of Grammaticality Judgment
The Time Course of Grammaticality Judgment
Arshavir Blackwell Elizabeth Bates Dan Fisher
University of California, San Diego
Language and Cognitive Process, 11(4), August, 1996, 337-406
ABSTRACT. Three experiments investigating the time course of grammaticality judgment are presented, using sentences that vary in error type (agreement, transposition, omission of function
words), part of speech (auxiliaries vs. determiners) and location (early vs. late error placement).
Experiment 1 is a word-by-word cloze experiment in which subjects are presented with successively longer fragments of a sentence and instructed to complete the sentence grammatically, if
possible. Experiment 2 is a self-paced, word-by-word grammaticality judgment experiment. Results of both these experiments are quite similar, showing that some error types elicit a broad
and variable Òdecision regionÓ instead of a more punctate Òdecision point.Ó To explore the implications of this Þnding, Experiment 3 looks at on-line judgments of the same stimuli in an
RSVP paradigm, with a single response (and reaction time). Correlations amongst the three experiments are extremely high and all signiÞcant, suggesting that the incremental tasks are tapping into the same decision-making process as is found on-line. Implications of these Þndings
for the error types that do and do not appear in aphasia are discussed.
INTRODUCTION
have been linguists trained to detect subtle
structural facts that may not be obvious to
laymen confronted with the same sentence
stimuli. As a result, the conclusions reached
by linguists do not always match the conclusions that one might draw if analyses
were based on grammaticality judgments
by naive listeners (for a detailed discussion
of this point, see Levelt, 1974). This is a
perfectly legitimate reason for linguists to
keep their judgments in-house. However, it
Halfway through the twentieth century,
linguistics underwent a major methodological shift, from distributional analysis of native-speaker speech (BloomÞeld, 1961), to
the analysis of native-speaker intuitions
about legal sentence types (Chomsky, 1957;
for reviews, see Newmeyer, 1980; Sells,
Shieber & Wasow, 1991). In most cases, the
native speakers who furnish these intuitions
This research was supported by an award from NIH/NIDCD 2 R01 DC00216-10 to Elizabeth Bates, ÒCross-linguistic
studies in aphasia.Ó We are grateful to Jeff Elman, Judith Goodman, Mark St. John, and Marta Kutas for comments on an
earlier draft of this paper. Please send correspondence to: Arshavir Blackwell, Center for Research in Language, University of California, San Diego, La Jolla, CA 92093-0526. E-mail:
[email protected]
-1-
BLACKWELL, ET AL.
The Time Course of Grammaticality Judgment
ERP studies: Studies of event related
brain potentials (ERP) of subjects exposed
to linguistic stimuli have been used to draw
a variety of conclusions about the language
processor; e.g., that semantic processes and
syntactic processes have at least partially
separate biological components (Hagoort,
Brown & Groothusen, 1993; Neville, Nicol,
Barss, Forster & Garrett, 1991; Osterhout
& Holcomb, 1993; Brown, Hagoort, &
Vonk, 1995). On the whole, these types of
studies have assumed a punctate point at
which the sentence became ungrammatical,
and thus compared ERPs at that only one
point between the ungrammatical and
grammatical control sentences.1 Because
these studies often use a word-by-word
grammaticality judgment paradigm similar
to those we use (i.e., subjects read a sentence one word at a time while their ERPs
are recorded), knowing more about the nature of this psychological process may offer
new insights into what is happening in these
experiments, and thus perhaps provide alternative interpretations of the results.
is not a good reason for psycholinguists to
avoid the study of grammaticality judgment
as a processing domain.
Because judgments of well-formedness
lie at the heart of one of the most important
movements in modern cognitive science, it
would be useful if we could learn more
about the nature and time course of this
psychological process. What psycholinguistic phenomena may inßuence such
metalinguistic judgments? (e.g., Levelt,
1972 1974, 1977). This is sufÞcient rationale for explorations of grammaticality
judgment as a psychological process (in naive as well as expert subjects), although
there are other reasons why this performance domain should be studied in more
detail. For example:
Aphasia: Grammaticality judgments
have played an increasingly important role
in research on language breakdown in
aphasia (Caplan, 1981; Caramazza &
Berndt, 1985; Caramazza & Zurif, 1976),
where one continuing puzzle has been that
if these patients suffer from a deÞcit in the
on-line activation of grammar, why are they
able to make reasonably good judgments of
grammaticality in on-line studies (for details, see Linebarger, Schwartz & Saffran,
1983; Shankweiler, Crain, Gorrell & Tuller,
1989; Wulfeck & Bates, 1991; Wulfeck,
1987; Tyler, 1992)? To answer this question
we need more information about normal
on-line grammaticality judgment.
The Experiments
Experiment 1 ascertains what sorts of
grammatical completions subjects entertain
as the sentence unfolds. Subjects are asked
to provide a possible grammatical completion of the sentence at each word. This cloze
experiment should yield valuable information about the number, range and strength
of the alternative completions that subjects
-2-
BLACKWELL, ET AL.
The Time Course of Grammaticality Judgment
may have in mind at each point across the
course of the sentence.
As we shall demonstrate below, our
technique will elicit some of the classic effects reported by authors using these three
paradigms. Finally, our technique is related
to recent studies of sentence processing (including sentences with grammatical violations) using event-related brain potentials
as the primary dependent variable (Kutas &
Kluender, 1991; Hagoort et al., 1993; Neville et al., 1991; Osterhout & Holcomb,
1993). However, our paradigm requires
conscious judgments of grammaticality at
every time point, whereas the ERP technique can be used to detect response to violations with no explicit task other than
reading or listening. In Experiment 3, the
same sentence stimuli are used in a simple
reaction time study, where subjects are
asked to push the button once for each sentence as soon as they know whether that
sentence is grammatical or not. As we shall
see, any conclusions that can be drawn
about the time course of grammaticality
judgment will depend crucially on the point
that is used to deÞne the onset of the error,
a Þnding that presents an interesting challenge to research programs that assume a
single violation point.
Experiment 2 is a self-paced, word-byword reading task where, after each word
appears, subjects press one of three buttons
(ÒgrammaticalÓ, ÒungrammaticalÓ, Ònot
sureÓ), indicating their judgment of the
grammaticality of the sentence to that
point. We expect some sentence stimuli to
yield a sharp boundary after which most
subjects agree that the sentence cannot be
salvaged (i.e., there is no well-formed way
for it to continue). We term this a Òdecision
point.Ó However, other sentence stimuli
may yield a decision-making region that
spans several words. We term this a Òdecision region.Ó Furthermore, subjects may
show marked individual differences in the
size of this decision region, and the speed
with which decisions are made at each point
within that region. The elicitation of wordby-word grammaticality judgments bears a
clear relationship to other word-by-word
techniques in the visual modality (e.g., Just
& Carpenter, 1980; Rayner, Carlson & Frazier, 1983; see also Boland, Tanenhaus,
Carlson & Garnsey, 1989; Boland, Tanenhaus & Garnsey, 1990; Mauner, 1992).
-3-
BLACKWELL, ET AL.
The Time Course of Grammaticality Judgment
GENERAL METHOD
get sentences (see Appendix I for all
stimuli).
Grammaticality Judgment Stimuli
for All 3 Experiments
Creation of ungrammatical targets.
The 84 ungrammatical targets and 40 grammatical controls come from a pool of grammatical sentences from 8 to 12 words long.
This pool of sentences represents a range of
seven structural types, varying in presence
and location of prepositional phrases, presence or absence of relative clause or subordinate clauses, and the number of adjectives
modifying the subject and object (see Appendix II). Approximately twenty different
sentence tokens were constructed for each
of these seven structural types, and randomly assigned to the appropriate ungrammatical target cell or grammatical control
condition. Half of the sentences in this pool
had at least one auxiliary verb to be the target of an auxiliary violation, while the other
half of sentences had at least one determiner (including numerals and demonstrative
adjectives) to be the target of a determiner
violation:
Ungrammatical targets. Stimuli were
168 sentences: 84 ungrammatical target
sentences, 40 grammatical control sentences matched for length and grammatical
structure, and 44 distractors (see below).
Experimental design focused on the ungrammatical targets, which varied in: a)
part of speech of the error (auxiliary vs. determiner); b) the position of the error (early
or late in the sentence), and, most importantly, c) type of violation (i.e., errors of
omission, agreement and transposition).
Thus, the ungrammatical target sentences
formed a 2 ´ 2 ´ 3 design, with part of
speech, location, and error type as withinsubject variables.
Grammatical controls. Each of the
twelve cells in the design had seven ungrammatical sentences. For each of these
ungrammatical sentences, there was a
grammatical control sentence matched for
length and grammatical structure. To keep
the experiment reasonably short, some
grammatical sentences were used as controls for more than one particular ungrammatical sentence. There were also 44
distractor sentences (22 grammatical and
22 ungrammatical) from 3 to 17 words
long, and of various structures. Distractors
were to prevent subjects from detecting regularities in the length and nature of the tar-
Auxiliary verb sentences: On half of
these items, the auxiliary was located early in the
sentence (e.g., ÒThey were reading several large
maps while waiting for the next train.Ó), while on
the other half, the auxiliary was located near the
end of the sentence (e.g., ÒIn a big, old, red boat,
two girls were rowing slowly.Ó).
Determiner sentences: On half of these
items, the target determiner was located early in
the sentence (e.g., ÒThe girl was eating some dark
chocolate ice cream.Ó), while on the other half, the
-4-
BLACKWELL, ET AL.
The Time Course of Grammaticality Judgment
some psycholinguistic differences between
the error types (e.g., Wulfeck & Bates,
1991; Wulfeck, Bates & Capasso, 1991;
Wulfeck, 1987). We have opted for the second strategy. Our choice of materials for
these studies is motivated (at least in part)
by recent research on grammatical breakdown in aphasia. In particular, we know
that some error types (i.e., omission and/or
substitution of functors) are very common
in speech production by aphasic patients.
Other error types (i.e., word order violations like Òdog theÓ or morpheme order violations like Òing-kissÓ) are exceedingly
rare (Bates, Wulfeck & MacWhinney,
1991). One possible explanation for this
sharp difference in the probability of error
types might lie in the monitoring mechanism that normals and aphasics use to detect errors in their own speech and/or to
weed out errors before they are produced. If
normal listeners are particularly sensitive to
word order errors, but less sensitive to errors of agreement and omission, then we
may conclude that the monitoring device is
less sensitive to errors of agreement and
omission under pathological conditions. To
test this hypothesis, we are building on an
earlier grammaticality judgment study by
Wulfeck & Bates (1991). Using auditory
stimuli, these authors showed that normal
English listeners are faster at detecting errors produced by moving a function word
downstream from its normal position (e.g.,
ÒShe is selling booksÉÓ * ÒShe selling is
booksÉÓ), compared with errors produced
target determiner was located near the end of the
sentence (e.g., ÒMy new blue and green silk ball
gown was costing a fortune.Ó).
Location of error. Early errors occurred within the Þrst 1200 msec (milliseconds) of the sentence (in the RSVP task),
while late errors occurred after this point.
The licensing word and the error were always adjacent (i.e., all local errors). Thus,
we used errors such as ÒThe girl were * going,Ó or ÒA girls * were goingÓ (where the
error was caused by the wrong juxtaposition of two directly adjacent words) but not
ÒA large black-and-white dogs were goingÓ
(where the mismatch is between ÒAÓ at the
beginning of the sentence and ÒdogsÓ several words downstream). Because omission,
agreement and transposition errors were
created from the same basic sentence types,
it can be argued that these stimuli represent
a set of minimal contrasts. Nevertheless,
even within a well-controlled stimulus set,
there are complicating factors affecting our
interpretation, to which we now turn.
Rationale for the Stimulus Materials
In designing stimuli for grammaticality
judgment, the experimenter has two choices: create grammatical deformations which
cleave along the lines of some linguistically
motivated theory (usually but not necessarily Generative Grammar; e.g., Kluender,
1992; Linebarger et al., 1983) or create sentences whose ill formedness is agnostic as
to particular linguistic theory, and is motivated by an empirical demonstration of
-5-
BLACKWELL, ET AL.
The Time Course of Grammaticality Judgment
ing,Ó the subject knows that a verb that
should have been proceeded by an auxiliary
was not. Second, because the word ÒeatingÓ
is soon followed by a period (visible at the
end of every sentence stimulus), the subject
may conclude that no further items will
come along to salvage the sentence (e.g. the
sentence will not turn into something such
as, ÒWhile sitting on the red sofa, her older
friend eating some cake was watching
TV.Ó). Hence we might argue that the above
examples each provide the subject with two
distinct error cues, illustrated as follows:
ÒWhile sitting on the red sofa, her older
friend eating * some cake. *Ó
by substituting an incorrect form of the
same function word within its usual position in the sentence (e.g., ÒShe is selling
booksÉÓ * ÒShe are selling booksÉ.Ó). In
the present study, we have expanded the set
of violations used by Wulfeck et al. to include omission errors (e.g., ÒShe is selling
booksÉ.Ó * ÒShe selling booksÉÓ). We
have also moved to the visual modality (removing any cues to ungrammaticality that
might be due to intonation and/or coarticulation), and added the cloze and incremental grammaticality judgment (GJ)
experiments.
Rules for creating the three error
types
Agreement errors: replace the target
word with an item that doesnÕt agree in
number. Note that violations of determiner
agreement within a subject noun phrase
provide two cues to the agreement violation. Cue one is the mismatch in number between determiner and noun (i.e., Òa girls
*Ó); cue two is the mismatch between the
auxiliary verb and the determiner (the auxiliary verb can only agree with one of the
two elements within the subject noun
phrase, either locally with the preceding
noun, or globally with the determiner). This
situation can be symbolized as, ÒA girls *
were * working quietly near the small, red
house.Ó The divergence point is just after
the noun (for determiner errors) or verb (for
auxiliary errors) which licenses the element
that is in error.
Omission errors: remove the relevant
word (auxiliary or determiner) from the
sentence (see Table 1). The asterisk (never
visible to subjects) refers to the aforementioned divergence point. Thus, for omission
errors the divergence point is just after the
word following the point where the omitted
element should go.
An additional complication comes from
the contrast between early and late omission errors. Because all of our sentences are
marked with normal English punctuation,
late omission errors often involve a double
cue. For example, given a late auxiliary
omission error such as, ÒWhile sitting on
the red sofa, her older friend eating* some
cake,Ó the subject actually has two cues to
help him decide whether an error has occurred. First, after reading the word Òeat-
-6-
BLACKWELL, ET AL.
The Time Course of Grammaticality Judgment
Transposition errors: move the relevant word one word downstream from
where it belongs. The divergence point is
just after the word following where the
moved element should go and before where
the moved element actually is. This matches the divergence point for omissions and is
the Þrst point at which the subject might notice that a potential element is missing (although see the note above about this). This
suspicion will, of course, be conÞrmed
when the subject encounters the displaced
element. Hence transposition errors constitute another instance in which there are really two cues to the existence of an error,
one at the Þrst point at which a subject
might notice that there is a hole (similar to
omission errors) and another at the point
further downstream where the displaced element
occurs.
Subjects are forced to make up their minds
at the divergence point on many late agreement and omission errors, because the sentence is already over (as indicated by a
periodÑsee Appendix I); by contrast, they
are able to delay their decisions for a while
on the transposition errors. Hence any differences that we may observe in the size of
the decision region for late errors may be a
by-product of unavoidable structural differences among the three late-violation types.
For this reason, all analyses of timing and
decision points will be conducted separately for early vs. late errors.
Variability within types. This design
has violation points for what is putatively
the ÒsameÓ error not necessarily always at
the same structural point, as shown in sentences 1.1 and 1.2 above. This leaves us
open to a potential criticism, that we are
creating our effects by artiÞcially choosing
some arbitrary point (the divergence point)
where subjects ÒshouldÓ detect the error,
and then demonstrating that they do not
necessarily detect the error at that point. We
must again stress that the divergence point
is not necessarily where subjects will Þrst
detect an error, though experimental subjects will certainly never detect an error
(correctly) earlier than the divergence
point. The divergence point is a point
structurally common across the various
error types and items (to the extent possible), as well as being the Òorigination
pointÓ of all of the error deformations (as
Late errors. There is one further difference between late omission errors and the
other two late error types: On transposition
errors, the moved element means that the
sentence will necessarily last one word
longer after the divergence point than it
does with errors of omission or agreement.
Because the three error types share the
same divergence point (i.e., they start to deviate at exactly the same point in the sentence), this need not constitute a problem.
However, if subjects cannot make up their
minds at the divergence point and want to
wait for more information before they decide, then we are faced with an artifact:
-7-
BLACKWELL, ET AL.
The Time Course of Grammaticality Judgment
determiner transposition errors, yet we
have found that for sentences such as 1.1,
subjects tend to indicate that the ungrammaticality occurs just after the transposed
determiner ÒtheÓ, while they judge 1.2 as
ungrammatical after the Þrst word after the
transposed determiner (in this case ÒareÓ),
entertaining completions such as ÒWomen
three hundred years ago were the subject of
oppression.Ó
Table 1 shows). It is an empirical question
where subjects will make their decisions,
and, of course, that is part of what we are
investigating. The diversity of sentences
with the ÒsameÓ error type is deliberate, and
a strength of these stimuli, as they directly
map onto the error types that we are investigating. These are error types which, as
stated above, do appear to have some kind
of psychological reality. Thus, for example,
Wulfeck (1987) reported differential sensitivity to transposition and agreement errors,
not to, e.g., transposition errors of only one
certain type. Certainly that leaves open
whether errors of a certain type are hewn
from one homogeneous kind. However, we
would argue that the proper Þrst step is to
examine more complete, albeit variegated,
sets of each of the various error types, as
that is what we knowÑat this pointÑto
have psychological reality, rather than to
cleave off and examine only sub-types of
these various errors.
1.1 Women the * + are walking to the
store
1.2 Women three * are + walking to the
store.
Stimulus design considerations
Our approach to this issue is to let the
subjects decide where the error begins (i.e.,
this is an empirical question), locking all
sentences within a particular class to a common divergence point, deÞned operationally as the point at which ungrammatical and
grammatical sentences of a particular type
differ due to the violations that we have imposed (see Table 1).
For a particular error type (e.g., transposition errors on determiners early in the sentence) the ungrammaticality does not
necessarily begin at the same structural
point (e.g., directly after the auxiliary verb
or determiner), yet it is this structural point
that the sentences have in common. For example, both sentence 1.1 and 1.2 are early
Reading-span test: One technique we
used to attempt to account for individual
differences is the Òreading spanÓ test (Carpenter & Just, 1989; Daneman & Carpenter,
1980; Just & Carpenter, 1992). However,
the test had little to tell us about the results
in these experiments, and for the sake of
brevity it is not reported on here.
-8-
BLACKWELL, ET AL.
The Time Course of Grammaticality Judgment
Table 1. Grammaticality judgment stimuli
omission
auxiliary verbs
agreement
transposition
omission
determiners
agreement
transposition
Joan [was] making * several big and tasty ice cream drinks.
Joan were * making several big and tasty ice cream drinks.
Joan [ ] making * was several big and tasty ice cream drinks.
[A] Boy * is driving a large van that the artist has painted.
Those boy * is driving a large van that the artist has painted.
[ ] Boy * a is driving a large van that the artist has painted.
EXPERIMENT 1: Cloze
center of the screen. The subject pressed the
middle button to bring the Þrst word of the
sentence to the screen. Subjects were instructed to use the index Þnger of their
dominant hand.
Method
Subjects. Ten college students (all
right-handed) participated in the experiment for course credit and payment. All
subjects were native English speakers, with
little if any facility in any other languages.
The sentence was centered vertically
and started at the left side of the screen.
Each button press brought the next word
onto the screen, until the entire sentence
was visible. After the last word appeared
the button press caused the next ÒREADYÓ
cue to appear.
Stimuli. See ÒGeneral MethodÓ.
Equipment. Each sentence was presented one word at a time, using an IBMPC/XT with a GoldStar 1210A amber
screen monitor. SubjectsÕ spoken response
was recorded on a Marantz PMD201 tape
recorder, using a Beyer-Dynamic Soundstar
MK-II microphone. Subjects also responded using a Carnegie-Mellon button box.
Subjects responded with one of two button
presses: Ògood,Ó (meaning that they completed the sentence grammatically), or
ÒbadÓ (meaning that they could not complete the sentence grammatically).
The experimenter instructed subjects to
try to complete, aloud, the sentence as read
so far, and to press the ÒgoodÓ button if they
did so. They were told that Òany grammatical sentence is acceptable as a completion.Ó
Subjects were instructed that if there was
Òno way to Þnish the sentence grammatically,Ó they were to say ÒcanÕt completeÓ and
press the ÒbadÓ button. Subjects were instructed to read the entire sentence aloud,
rather than merely their completion. The
experimenter told subjects that once they
Procedure. A trial began with a
ÒREADYÓ cue appearing near the bottom
-9-
BLACKWELL, ET AL.
The Time Course of Grammaticality Judgment
believed that the sentence could not be
completed grammatically, they should continue with the ÒcanÕt completeÓ response if
they continued to believe that the sentence
could no longer be completedÑeven if the
remainder of that sentence seemed wellformed. They were instructed to complete
the sentence only if they could generate a
complete, grammatically correct sentence.
During the instruction phase, some subjects
asked the experimenter whether a particular
practice item was correct or incorrect.
When this occurred, subjects were again
told to base their responses on what they
themselves considered to be correct grammar. When the entire sentence was on the
screen (including the period), subjects were
instructed to read it aloud if they believed it
to be grammatical, and press the ÒgoodÓ
button, or, if they thought it not grammatical, to again say ÒcanÕt completeÓ and press
the ÒbadÓ button.
The actual experiment consisted of 168
trials using the sentence stimuli described
in the ÒGeneral MethodsÓ section. Each
subject received the sentences in a different
random order, determined by the computer
program. Subjects were told that they
would receive a break at the mid-point of
the experiment (this was after trial 84). At
this point, instead of the ÒREADYÓ cue, the
subject received a ÒPLEASE WAITÓ cue.
Subjects were given ten to twenty practice sentences of similar kind before the ac-
tual experiment, depending upon how
clearly they understood the task.
Scoring: Both the point at which subjects Þrst said ÒcanÕt completeÓ and the
sorts of grammatical completions subjects
gave up until that point were transcribed.
Our primary dependent measure is the
mean number of words past the divergence
point that subjects Þrst said they could not
complete the sentence grammatically.
Results
Overall performance for non-Þller
sentences
The statistics we report on are only for
the Òcore stimuliÓ or non-Þller sentences.
Almost all of the stimulus sentences were
correctly judged by the end of the sentence.
Subjects had a mean hit rate to ungrammaticals of 96.8%, with only 2.8% false alarms,
which is an AÕ of 98.5 (AÕ is a non-parametric statistic used to correct for response bias
(Grier, 1971; Pollack & Norman, 1964). No
individual subject AÕ was below 97.8.
By-item analyses
94.1% of the ungrammatical experimental stimuli were responded to correctly
by the end of the sentence by at least 90%
of all subjects, with all but one of the remaining items responded to correctly by at
least 70% of all subjects. The one item
which had a 40% correct rate (sentence
#8.11) is dropped from further analysis.
-10-
BLACKWELL, ET AL.
The Time Course of Grammaticality Judgment
Cloze experiment: mean words past divergence point
A: Early errors
mean words past CSP
3
2
B
B: Late errors
3
2
auxiliary
O
B
determiner
1
O
0
O
B
1
0
B auxiliary
O
OB
OB
determiner
-1
omis.
agree.
-1
trans.
omis.
agree.
trans.
error type
error type
Figure 1. Cloze experiment: Mean number of words past the divergence point.
omis. = omission; agree. = agreement; trans. = transposition
Analysis of variance
Subject responses were converted to a
score indicating mean number of words
past the divergence point at which subjects
Þrst gave a Òcannot completeÓ response
(that is, at which subjects could no longer
generate a grammatical completion to the
sentence as read so far). The data were submitted to two analyses of variance, one for
early errors, the other for late errors. The
within-subject factors were part of speech
(auxiliary vs. determiner) and error type
(omission, agreement, transposition), with
subjects as the random factor. A parallel
analysis by items, with the between-subjects factors of part of speech and error
type, and with sentences as the random factor, is also presented.
Early errors: For early errors, both
type (F1(2,18) = 5.69, p < 0.0122; F2(2,36)
= 8.13; p < 0.05) and part of speech ´ type
(F1(2,18) = 7.03, p < 0.0055; F2(2,36) =
6.62; p < 0.05) were signiÞcant. Agreement
errors had a mean score of 0.24, transposition 0.91, and omission 1.36; a NewmanKeuls analysis showed agreement and
omission to be signiÞcantly different from
each other (by items, agreement < transposition = omission). A breakdown of the interaction, by part of speech, showed that for
auxiliary errors, omission errors (2.06)
were signiÞcantly higher than either transposition (0.71) or agreement (0.07), using
Newman-Keuls. For determiner errors,
transpositions (1.11) were signiÞcantly
higher than omission (0.66) or agreement
errors (0.40). See Figure 1A.
-11-
BLACKWELL, ET AL.
The Time Course of Grammaticality Judgment
Table 2. Cloze Experiment: Percent sentences judged ungrammatical at
divergence point
early
auxiliary
determiner
late
omission
69.6
77.6
agreement
97.1
100.0
transposition
60.9
51.5
omission
56.5
100.0
agreement
66.2
100.0
transposition
11.4
30.8
Late errors: For late errors, type
(F1(2,18) = 8.26, p < 0.0029; F2(2,35) =
9.66; p < 0.05) was signiÞcant, and part of
speech ´ type (F1(2,18) = 3.80, p < 0.05;
F2(2,35) = n.s.) was marginally signiÞcant.
Agreement errors had a mean score of 0.11, omission 0.09, and transposition 0.49;
a Newman-Keuls analysis showed transposition signiÞcantly different from the other
two conditions (also by items). A breakdown of the interaction, by part of speech,
showed that for auxiliary errors, agreement
errors (-0.04) were signiÞcantly lower than
either transposition (0.47) or omission
(0.33), using Newman-Keuls. For determiner errors, transposition errors (0.50)
were signiÞcantly higher than omissions (0.16) or agreement errors -0.18). See Figure 1B.
What sorts of responses are subjects
making?
The cloze experiment, besides allowing
us to see at what point subjects can no long-
er generate a grammatical completion of a
sentence, also permits us to ask what sorts
of completions subjects are making at each
point, when they still believe the sentence
can be saved. Overall, 67.7% of correctlyresponded-to ungrammaticals were deemed
ungrammatical by the divergence point.
Some responses fell into a miscellaneous
category including sentences where subjects brießy (for a few words) changed their
choice; e.g., giving a grammatical completion for several gates, saying at the next
word Òcan't completeÓ, then continuing the
grammatical completion. This occurred in
roughly 3% of ungrammatical sentence responses, and is ignored in this analysis.
Table 2 shows the by-cell percentage of
sentences judged ungrammatical at the divergence point. Here is a breakdown of the
sorts of grammatical completions subjects
provided when they continue to give a response after the divergence point; see Figure 2 for a graphical representation for the
-12-
BLACKWELL, ET AL.
The Time Course of Grammaticality Judgment
major categories. (we recognize that some
completions may fall into more than one
category; however, each sentence was only
placed in one).
Early errors:
Auxiliary errors: For omissions,
69.6% were deemed ungrammatical (Òcan't
completeÓ) by the divergence point. The
grammatical completions at or after that
point were either:
•
¥
present-participial verb-phrase completion (e.g., “The boy taking [sentence fragment seen by subject]… a
black car is a criminal [subject’s
completion],” 90.5%) or
gerund + ÒthatÓ clause completion
(ÒTomÕs mother forgettingÉthat he
had already packed his lunch began
to pack his lunch,Ó 9.5%).
Determiner errors: For omissions,
56.5% were deemed ungrammatical by the
divergence point. The grammatical completions were:
•
use of noun with copula (“Boy… is a
term that is used with a condescending air,” 30.0%),
¥
use of noun as title or proper noun
(ÒBoyÉ George is a very strange
person,Ó 26.7%),
use of noun as adjective (ÒWomanÉ
doctors are better than man doctors,Ó
20.0%),
use of noun in a general sense, or to
stand in for a group (ManÉ is said to
be GodÕs greatest creation,Ó 13.3%),
and
noun as interjection (ÒBoy Édo I
have a sore finger,Ó 10.0%).
¥
¥
¥
For agreement errors, 97.1% were
deemed ungrammatical by the divergence
point. The grammatical completions were
all corrections of the existing grammatical
error. For transposition errors, 60.9% were
deemed ungrammatical by the divergence
point. The grammatical completions were:
•
present-participial verb-phrase completion (88.8%),
¥
gerund + ÒthatÓ completion (7.4%),
and
use of noun as adjective (ÒStudents
writingÉ is put in the offices of some
elementary schools,Ó 3.7%; note that
many of these types of completions
involved subjects mistakenly using a
plural noun as a possessive; recall that
the stimuli are visual.)
¥
-13-
BLACKWELL, ET AL.
The Time Course of Grammaticality Judgment
Figure 2. Cloze experiment: Breakdown of grammatical completions into major
categories by cell.
For agreement errors, 66.2% were
deemed ungrammatical by the divergence
point. The grammatical completions were:
¥
•
use of noun as adjective (“Several
-14-
sailor… uniforms were in my bag,”
“A boys… life is very simple,”
82.6%), and
correction on the existing grammatical error (17.4%).
BLACKWELL, ET AL.
The Time Course of Grammaticality Judgment
For transposition errors, 11.4% were
deemed ungrammatical by the divergence
point. The grammatical completions were:
For transposition errors, 51.5% were
deemed ungrammatical by the divergence
point. The grammatical completions were:
•
•
¥
¥
¥
¥
¥
¥
use of noun as title or proper noun
(“Guest… number three entered
through the door”, “Announcer…
Chuck Hern is a very funny guy,”
32.3%),
use of noun as adjective (19.4%),
use of noun in a general sense
(14.5%),
use of displaced element as adjective
following noun (ÒWomen threeÉ decades ago did not have the same
rights as they do today,Ó 8.1%),
use of noun as interjection (3.2%),
use of noun with copula (3.2%), and
other grammatical completion following unmodified noun.
¥
¥
Determiner errors: For late determin-
Late errors:
Auxiliary errors: For omissions,
77.6% were deemed ungrammatical by the
divergence point. The grammatical completions were:
•
¥
¥
present-participial verb-phrase completion (42.4%),
correction on the existing grammatical error (3.0%), and
other grammatical completion following verb (ÒThose pilots were saying that several clouds coveredÉ the
entire sky,Ó 54.5%).
present-participial verb-phrase completion (86.6%),
corrections on the existing grammatical error (6.7%), and
use of verb gerund as adjective (ÒThe
young, new president of JohnÕs college speakingÉ school is an idiot,Ó
6.7%).
er errors, the divergence point for both
omission and agreement errors was also the
last word of the sentence; thus, 100% of the
correct responses in this cell were by the divergence point, by necessity. For transposition errors (where there is one more
elementÑthe displaced determinerÑafter
the divergence point) 30.8% were deemed
ungrammatical by the divergence point.
The grammatical completions were:
•
¥
¥
¥
For agreement errors, 100% were
deemed ungrammatical by the divergence
point.
¥
¥
-15-
use of noun as adjective (24.4%),
use of noun as title or proper noun
(8.9%),
correction on the existing grammatical error (6.7%),
Prepositional or gerundive phrase following unmodified noun (ÒThe magazine reporter was donating one
hundred dollars to hospitalsÉ treating AIDS,Ó 6.7%),
reduced relative clause (2.2%), and
other grammatical completion following unmodified noun (ÒGeorgeÕs
remaining dinner guests were drinking wineÉ and eating rolls,Ó 51.1%).
BLACKWELL, ET AL.
The Time Course of Grammaticality Judgment
Summary of results for Experiment 1
Native speakers offer a range of alternative completions for the 12 error types employed in these experiments at or after the
divergence point (i.e., the point at which the
stimuli deviate from each other and from
grammatical controls). These include many
grammatical or (in some cases) semi-grammatical completions.
Early auxiliary errors. Subjects provided grammatical completions to early
auxiliary omissions and transpositions at
the divergence point an average of 35% of
the time, less than for the corresponding
early determiner errors (see below), but
more than for early auxiliary agreement errors, for which subjects provided a grammatical completion at the divergence point
only 3% of the time. For both early auxiliary omissions and transpositions, about 90%
of all grammatical completions were
present-participial verb-phrase completions
such as, ÒMrs. Brown[,] working at the libraryÉÓ
Early determiner errors. Subjects
provided grammatical completions to early
determiner omissions (44%) and transpositions (88%) at the divergence point an average of 66% of the time, suggesting that to
some extent they believed the sentence to
be grammatical to that point in many cases,
but that there was also some doubt. Subjects
provided a variety of completions at this
point for both error types, including use of
the bare noun as proper noun or title (e.g.,
ÒPresidentÉ Clinton was briefed by his advisors.Ó), use of noun in the general sense
(e.g., ÒManÉ is a fragile creature.Ó), and
use of noun as adjective (e.g., ÒWomanÉ
doctorsÉÓ). Subjects provided grammatical completions to early determiner agreement errors at the divergence point an
average of 34% of the time, suggesting that
fewer believed the sentence to be grammatical at that point compared to the other two
early determiner error types. 83% of these
early determiner agreement error completions involved the use of the bare noun as an
adjective (e.g., ÒSeveral sailorÉ uniforms
were in my bag,Ó ÒA boy[Õ]sÉ life is very
simple,Ó), including many completions
where subjects mistakenly used a plural
noun as a possessive.
Late errors. As mentioned above, the
divergence point for both late determiner
omission and agreement errors was also the
last word of the sentence; thus, 100% of the
correct responses in this cell had to be before or at the divergence point. Subjects
provided grammatical completions to late
determiner transpositions at the divergence
point an average of 69% of the time, providing a variety of completions such as use of
noun as adjective, use of noun as title or
proper noun, correction of the grammatical
error, and prepositional or gerundive phrase
following unmodiÞed noun. Subjects never
provided grammatical completions to late
auxiliary agreement errors at the divergence pointÑi.e., if a subject indicated that
-16-
BLACKWELL, ET AL.
The Time Course of Grammaticality Judgment
the sentence was ungrammatical on the last
button press, they had indicated it by the divergence point. Subjects provided grammatical completions for the other two
auxiliary error types an average of 35% of
the time, with a large number of those corrections being present-participial verbphrase completions.
To summarize, subjects were more likely to provide grammatical completions at
the divergence point for errors appearing
early in the sentence than for those appearing late, for omission and transposition errors than for agreement errors, and for early
determiner errors than for early auxiliary
errors.
EXPERIMENT 2: Incremental Grammaticality Judgment
Experiment 2 is a self-paced, word-byword reading task where, after each word
appears, subjects press one of three buttons
(ÒgrammaticalÓ, ÒungrammaticalÓ, Ònot
sureÓ), indicating their judgment of the
grammaticality of the sentence to that
point. We expect subjectsÕ judgment of
grammaticality in this task to be quite consistent with the number and range of completions offered at each word in the cloze
experiment.
Method
Subjects. Subjects were thirty-Þve college students (Þve left-handed; twenty-two
female and thirteen male) who participated
in the experiment for course credit, or for a
payment of $7.00. All subjects stated that
they were native speakers of English.
Stimuli. The stimuli were identical to
those of Experiment 1.
Equipment. Each sentence was presented one word at a time, using an IBMPC/XT with a GoldStar 1210A amber
screen monitor. Subjects responded using a
Carnegie-Mellon button box, accurate to
one millisecond. Subjects responded with
one of three button presses: Ògood,Ó Òbad,Ó
or Ònot sure.Ó
Procedure. A trial began with a
ÒREADYÓ cue appearing near the bottom
center of the screen. The subject pressed the
middle button, corresponding to Ònot sure,Ó
-17-
BLACKWELL, ET AL.
The Time Course of Grammaticality Judgment
to bring the Þrst word of the sentence to the
screen. Subjects were instructed to use the
index Þnger of their dominant hand, and to
keep the Þnger at a home spot beneath the
middle key between button presses.
The sentence was centered vertically
and started at the left side of the screen.
Each button press brought the next word
onto the screen, until the entire sentence
was visible. After the last word appeared
the button press caused the next ÒREADYÓ
cue to appear.
The experimenter instructed subjects to
decide, after each word appeared upon the
screen, whether the sentence up to that
point was Ògrammatically correct.Ó We did
not elaborate upon what Ògrammatically
correctÓ meant, and if subjects asked, we
simply re-iterated that we wanted them to
decide whether the sentence was grammatically correct or incorrect. The experimenter told subjects that, once they believed that
the sentence had gone bad, they should continue pressing the ÒbadÓ button if they continued to believe that the sentence could no
longer be savedÑeven if the remainder of
that sentence seemed well formed. They
were instructed to press the ÒgoodÓ button
again only if they had changed their mind
about the error. During the instruction
phase, some subjects asked the experimenter whether a particular practice item was
correct or incorrect. When this occurred,
subjects were again told to base their re-
sponses on what they themselves considered to be correct grammar.
The actual experiment consisted of 168
trials of the same sentence stimuli as Experiment 1. Each subject received the sentences in a different random order, determined
by the controlling computer program. Subjects were told that they would receive a
break at the mid-point of the experiment
(after trial 84). At this point, instead of the
ÒREADY Ó cue, the subject received a
ÒPLEASE WAITÓ cue.
Subjects were given twenty practice
sentences before the actual experiment.
Both button presses and reaction time
were collected. Reaction time was measured from the onset of the current word to
the button press.
Scoring. A button press was recorded
for every word of every sentence. Reaction
time to each word was also recorded. The
following dependent variables were derived
from these data:
1. Final button press (a measure of overall
accuracy);
2. Normalized word-by-word button
press (explained below), to determine
the shape of the decision function for
each item type;
3. Normalized word-by-word reaction
time, a complementary measure of
the shape of the decision function.
This included only reaction times for
button presses before an ÒungrammaticalÓ response was madeÑi.e.,
-18-
BLACKWELL, ET AL.
The Time Course of Grammaticality Judgment
only button presses where subjects
were still making a decision about ungrammaticality.
Because individual sentences varied in
length, and we wished to compare several
different points across different sentences,
word-by-word data were temporally normalized (or aligned) in the following way
(see Figure 3): The Þrst button press of each
sentence was synchronized at ÒÞrst,Ó the
last button press at Òlast.Ó The divergence
point is labeled ÒzeroÓ on the graphs. In
those cases where a sentence either began
or ended on the divergence point, this point
was synchronized at zero and not at Þrst or
last. Words in between the Þrst point and
the divergence point, and between the divergence point and the last button press,
were binned and averaged within the bins.
For early errors, there was one bin between
the Þrst word and the divergence point, corresponding to all words between (and not
including) the Þrst button press and the divergence point (in fact, this bin existed for
early auxiliary but not early determiner errors, because there were no words between
the Þrst word and the divergence point for
early determiner errors). After the divergence point, data were binned into a Ò20%Ó
interval (corresponding to the Þrst 0-20%
of the sentence past the divergence point), a
Ò40%Ó interval (corresponding to the Þrst
20-40% of the sentence past the divergence
point), and so on. Because each sentence
was from eight to twelve words in length,
each bin roughly corresponds to one word.
The scheme for late errors was similar: The
Þrst bin corresponds to the Þrst button
press, followed by the ÒÑ80%Ó interval
(the Þrst 100-80% of the sentence before
the divergence point, excluding the Þrst
word), the ÒÑ60%Ó interval and so on.
Final button press refers to the judgments obtained on the last button pressed
for ungrammatical sentences. The Þnal button press was evaluated using AÕ to grammaticals and ungrammaticals combined. As
we noted above, AÕ is a non-parametric statistic used to correct for response bias (Grier, 1971; Pollack & Norman, 1964). As
such, it is similar to dÕ. Raw percent correct
scores for grammatical and ungrammatical
stimuli do not account for the possibility of
subject response bias. For example, a score
of 100 for ungrammatical stimuli (all ungrammatical stimuli correctly identiÞed)
could mean that the subject is perfect at the
taskÑor simply that the subject has an
overwhelming tendency to guess that a sentence is ungrammatical. This cannot be determined without looking at both hits and
false alarms. The above subject might also
have a false-alarm rate of 100, indicating
that in fact they are incapable of differentiating grammatical from ungrammatical
sentences and judge everything as ungrammatical. Conversely, a false-alarm rate of
0.00 (with a hit rate of 100) would constitute perfect performance. AÕ is a uniÞed statistic that corresponds to the underlying
percent correct in a two-option forced
-19-
BLACKWELL, ET AL.
The Time Course of Grammaticality Judgment
early errors
Mrs.
Brown
She
Þrst
>0%
working *
quietly
in
the
church
kitchen.
signing *
was
her
newest
and
biggest story
collection.
Girl *
was
eating
some
dark
chocolate ice
cream.
zero
<20%
<40%
<60%
<80%
<100%
last
late errors
A
small
and
harmless
black
dog
chasing*
The
magazine
reporter was
donating one
hundred
dollars to
hospitals*
Þrst
>80%
>60%
>40%
>20%
>0%
zero
was
chickens.
those.
<0%
last
Figure 3. Incremental grammaticality judgment bins
choice task, correcting for bias. It can range
from 50.0 (chance performance) to 100.0
(perfect discrimination). Of course, normal
subjects should not show such strong biases, but AÕ still permits discrimination of
subject accuracy differences that are more
subtle; for example, which subject is more
accurate, one with 90% hit rate and a 1%
false-alarm rate, or one with a 99% hit rate
and a 10% false-alarm rate? (for details, see
Pollack & Norman, 1964).
For AÕ, all subject responses for target
stimuli were used (i.e., ungrammaticals and
their grammatical controls). For each of the
twelve cell conditions and for each subject,
a signal-detection matrix was generated,
and ÒhitsÓ (correct judgments of ungrammaticality) and Òfalse alarmsÓ (incorrect
decisions that a sentence is ungrammatical)
were calculated.
To examine the on-line course of grammaticality judgments, the judgments made
at each word in the sentence (Òbad,Ó Ònot
sure,Ó and ÒgoodÓ) were translated into values of 0, 50, and 100, respectively. These
judgments were then averaged over subjects and over sentences within a cell.
Results and Discussion
Accuracy on grammatical and
ungrammatical targets
Overall accuracy levels, deÞned by the
subjectÕs decision on the last button press,
were very high in this experiment, averaging 94.6% (see Table 3). For ungrammatical
sentences, subjects used the Ònot sureÓ option at some point in their choice only 17%
of the time. For ungrammatical sentences,
subjects gave non-monotonic responses
only 3.8% of the time. The average AÕ over
subjects was a high 97, with no subject outliers (deÞned as any subject with an AÕ
more than 2.5 standard deviations from the
mean). Nor was any subjectÕs mean Òby
wordÓ reaction time more than 2.5 standard
-20-
BLACKWELL, ET AL.
The Time Course of Grammaticality Judgment
Table 3. Incremental
Grammaticality Judgment
Experiment: Final button press on
target sentences
gram.
ungram.
ÒgoodÓ
Ònot sureÓ
ÒbadÓ
94.50%
2.36
3.14
4.01
1.29
94.69
Performance on individual sentences
was examined to determine whether any of
the sentences were outliers (deÞned as any
sentence with a response more than 2.5
standard deviations from the mean, by item
analysis). Only one sentence (#8.11 in Appendix I.C) met this criterion, classiÞed as
ungrammatical by only 29% of subjects.
This sentence was dropped from all further
analyses, and A' scores were calculated
with this sentence removed. A 2 ´ 2 ´ 3
ANOVA with subjects as the random factor
was conducted on these AÕ scores.
The following effects were signiÞcant
(in all cases in this report, signiÞcant means
p < 0.05): part of speech (F1(1,34) = 7.27,
p<0.0108; F2(1,71) = 13.10, p < 0.0006)
and part of speech ´ type (F1(2,68) = 9.70,
p<0.0002; F2(2,71) = 7.07 p < 0.0016); location ´ part of speech was also signiÞcant
by items only, F2(1,71) = 4.78, p < 0.04).2
Post-hoc tests were used to explore the
various signiÞcant effects; unless otherwise
stated, all post-hoc tests reported are Newman-Keuls. The pattern of accuracy for error types was different for the two parts of
speech: for auxiliary errors, omissions >
transpositions; for determiner errors, transpositions = agreement errors > omissions.
Subjects showed an effect of part of speech
only for omissions errors (auxiliary omissions > determiner omissions; also by
items) not for agreement or transposition
errors (see Figure 4).
100
98
accuracy (A-prime)
deviations from the grand experimental
mean of 935 msec (this relatively high value was due in part to last button press, as we
shall see later). For the bin-by-bin reaction
times reported below, individual reaction
time points greater than 2.5 standard deviations from the mean (more than 4100 msec)
were eliminated (this constituted removing
less than 2% of the data set). The A' analysis was conducted over subjects only, since
the logic of A' is difÞcult to apply in an
analysis by items.
B
96
O
auxiliary
B
O
BO
determiner
94
92
omission
agreement
error type
transposition
Figure 4. Incremental grammaticality
judgment. AÕ by error type and part of
speech
To summarize results for this analysis,
overall accuracy was very high across all
categories. For all subjects, there was a
-21-
BLACKWELL, ET AL.
The Time Course of Grammaticality Judgment
small disadvantage for violations involving
determiners (especially determiner omissions).
Normalized bin-by-bin judgments and
reaction times
In this section, we will begin with a global comparison of changes in judgment and
reaction time for ungrammatical sentences
and their grammatical controls. Early vs.
late errors will be handled separately, but all
other factors are collapsed at this level of
description. We do not present signiÞcance
tests at this point, but simply present the
overall shape of the data. Then we will
present detailed results for all twelve error
types, evaluated in four separate analyses of
variance over subjects (judgment and reaction times on early errors; judgment and reaction times on late errors). In each of these
analyses, part of speech, error type and sentence position (or ÒbinÓ) serve as withinsubject factors. To maintain our focus on
patterns of change over time (and to avoid
redundancy with the previous section), we
will restrict our discussion to main effects
and interactions involving the factor bin.
Figure 5A illustrates the normalized
bin-by-bin judgments observed on all ungrammatical targets with an early violation
(collapsed over part of speech and violation
type), compared with their grammatical
controls. The vertical axis represents the
mean rejection rate (i.e., mean percent
judged ungrammatical) at each point in the
sentence, from 0% (always judged gram-
matical) to 100% (always judged ungrammatical). The horizontal axis represents
percentage of the sentence read so far, normalized for sentence length, with zero being the divergence point (see Methods
section). The divergence point, which, recall, is the same structural point for omission, agreement and transposition errors,
indicates the point at which we might expect a divergence between comparable
grammatical and ungrammatical forms.
This is, in fact, exactly what we observe.
Notice, however, that the decision function
for grammatical controls is not ßat. Instead,
there is a slight rise in the false-alarm rate
that is most visible on the last word in the
sentence.
Figure 5B compares the bin-by-bin
judgments observed on all ungrammatical
targets with a late violation, compared with
their grammatical controls. Once again, we
see the predicted divergence between grammatical and ungrammatical sentences at the
divergence point. And we also see a slight
rise in false alarms for grammatical controls toward the end of the sentence (averaging 4%).
-22-
BLACKWELL, ET AL.
The Time Course of Grammaticality Judgment
Incremental grammaticality judgment experiment: choice and reaction time by location
A: early errors: percent choice
▲
80
▲
▲
▲
B: late errors: percent choice
▲
100
▲
▲
80
▲
60
▲
60
40
40
▲
20
80%
▲
❍
❍
▲
❍
▲
❍
▲
▲
❍
❍
❍
❍
last
60%
❍
▲
<100 %
40%
0
0
20%
❍
< 0%
0
❍
Ñ20%
❍
Ñ40%
❍
Ñ60%
❍
Ñ80%
❍
first
❍
last
❍
▲
< 100%
❍
▲
< 0%
0
20
first
percent choice ungrammatical
100
percent past divergence point
▲
ungrammatical
❍
C: early errors: reaction time (msec)
❍ 1400
1400
grammatical
D: late errors: reaction time (msec)
❍
1150
▲
▲
▲
▲
900
❍
▲
❍
▲
❍
❍
650
❍
▲
❍
❍
▲
900
❍
▲
▲
❍
▲
❍
650
400
❍
▲
❍
▲
Ñ40%
▲
Ñ60%
1150
❍
▲
▲
❍
❍
❍
▲
last
<100 %
0
< 0%
Ñ20%
Ñ80%
last
< 100%
80%
60%
40%
20%
0
< 0%
first
400
first
reaction time (msec)
▲
percent past divergence point
Figure 5. Incremental grammaticality judgment: Choice and reaction time by
location
Figure 5C compares the bin-by-bin reaction times observed for ungrammatical
targets with an early violation with their
grammatical controls (collapsed over part-
of-speech and violation type; recall that
these are only reaction times for button
presses before an ÒungrammaticalÓ response was made, and that individual data
-23-
BLACKWELL, ET AL.
The Time Course of Grammaticality Judgment
points more than 2.5 standard deviations
from the overall mean were eliminated).
Here (as in Figure 5A for bin-by-bin judgments), we would expect a marked divergence in reaction times beginning around
the divergence point, with subjects slowing
down as soon as they detect a potential error. This is, in fact, what we observe. Notice, however, that the reaction time
function for grammatical controls is not
ßat. We might have expected roughly
equivalent reading times at every point
across the course of the sentence. Instead,
we observe a slight slowing of reaction
times toward the end of the sentence (compared to the middle) for grammatical controls, including a great increase in reaction
time at the last button press. The gradual
deceleration in grammatical sentences before the last press may reßect increased processing load and/or increased caution as
information accumulates and the end of the
sentence nears, while the particularly sharp
increase at the Þnal word may be due to
subjects making a Þnal check for potentially missed errors. For ungrammatical sentences, the predicted increase in reaction
times after the divergence point is followed
by a gradual drop back to the pre-error
baseline. Ungrammatical sentences also
show this effect, but in this case the effect
appears to be restricted to the last word in
the sentence. Figure 5D presents bin-by-bin
reaction times for ungrammatical sentences
with a late violation, compared with grammatical controls. Like Figure 5B for judg-
ments, this Þgure also shows a marked
divergence in reaction times around the divergence point. However, there is much less
divergence between late violations and
their grammatical controls, due to a confound between error detection (which slows
reaction times toward the end of the sentence for items with a late violation) and the
last-press elevated-reaction time effect described earlier (which slows reaction times
at the end of the sentence for grammatical
controls).
The twelve error types analyzed separately: Finally, let us turn to the patterns
of change associated with each of the 12 error types. As noted, there were four separate
analyses of variance: early judgments, late
judgments, early reaction times and late reaction times; for reaction times, missing
observations were replaced with cell
means. We will restrict ourselves here to effects involving the factor ÒbinÓ. All four
analyses yielded a very large main effect of
bin (p < 0.0001 in every case, also by
items), a signiÞcant two-way interaction
between bin and error type (p < 0.0001 in
every case, also by items except for early
reaction times, p < 0.03), and a signiÞcant
two-way interaction between bin and part
of speech (p < 0.0001 in every case; p<
0.015 by items). Most interesting for our
purposes here are the three-way interactions of bin, part of speech and error type.
This three-way interaction reached signiÞcance in the analysis of early decisions
-24-
BLACKWELL, ET AL.
The Time Course of Grammaticality Judgment
bin
early errors
factor
0%
20%
40%
60%
late errors
< 100%
last
0%
50%
part of speech
type X part of speech
type
part of speech
type X part of speech
Figure 6. Incremental grammaticality judgment: Analyses of Variance
(F1(12,408) =7.13, p < 0.0001; F2(12,216)
= 2.75, p < 0.0017), the analysis of late decisions (F1(12,408) = 22.44, p < 0.0001;
F2(12,210) = 3.93, p < 0.0001), the analysis
of early reaction times (F1(12,408) = 8.76,
p < 0.0001; F2(12,216) = 3.00, p<0.0007),
and the analysis of late reaction times
(F1(12,408) = 3.56, p < 0.0001; F2(12,210)
= n.s).
Figure 6 presents a more detailed, bybin breakdown of signiÞcant effects for early and late errors separately.
Figure 7A to Figure 7C present the binby-bin judgments observed for early auxiliary errors (as analyzed just above) for
omissions, agreement errors, and transpositions. Both Ònot sureÓ responses (striped)
and ÒbadÓ responses (light gray) are shown.
Agreement errors are resolved fairly quick-
100
percentage button press
ÒungrammaticalÓ
Ònot sureÓ
type
80
60
40
20
0
first <0% 0% 20% 40% 60% 80% <100 last
percent past divergence point
Figure 7A. Early auxiliary omission,
Choice
ly, and all three error types have signiÞcantly different rejection rates at the zero point
(using Newman-Keuls): agreement (67%)
> omission (44%) > transposition (34%).
Agreement errors reach 92% by the next interval (i.e., the Ò20%Ó interval), where they
are still signiÞcantly higher than the other
two error types.
-25-
last
BLACKWELL, ET AL.
The Time Course of Grammaticality Judgment
100
percentage button press
percentage button press
100
80
60
40
20
0
first <0% 0% 20% 40% 60% 80% <100 last
percent past divergence point
80
60
40
20
0
first <0% 0% 20% 40% 60% 80% <100 last
percent past divergence point
Figure 7B. Early auxiliary agreement,
Choice
iliary errors, with all three error types on
one graph.
1500
●
●
1250
reaction time (msec)
Omission and transposition errors
switch rank order at the Ò20%Ó interval
(which corresponds to the displaced element on transposition errors), with transpositions at 75% and omissions signiÞcantly
less at 59%. This suggests that the second
piece of information (a displaced auxiliary
verb) is sufÞcient to quell doubts for most
of our subjects.
Figure 7C. Early auxiliary transposition,
Choice
▲
▲
●
▲
●
●
▲
1000
750
●
▲
■
●
▲
■
■
■
●
■
▲
■
< 0%
0
■
■
●
▲
▲
■
500
250
0
first
20%
40%
60%
80% < 100% last
percent past divergence point
By contrast, omission errors show a
more protracted rise in rejection rates
across the rest of the sentence, with many
subjects refusing to make up their minds
before the Þnal button press. The pattern
agreement > transposition > omission obtains up to and including the Ò40%Ó interval. After that point, transpositions rise to
meet agreement, and the two are no longer
signiÞcantly different, while omissions remain signiÞcantly less than these two until
all three converge at the last button press.
Figure 7D presents complementary data
for bin-by-bin reaction times on early aux-
▲
omission
■
agreement
●
transposition
Figure 7D. Early auxiliary errors, Reaction time
Recall that these data are only for responses
where the subject has not yet indicated ÒungrammaticalÓ to the sentenceÑi.e., where
the subject is still deciding. Reaction times
Þrst show a signiÞcant difference between
types at the divergence point: agreement
(709 msec) < transposition (1,129 msec) =
omission (1,247 msec). Both omissions and
transpositions show a signiÞcant reaction
time jump from the interval just before the
-26-
BLACKWELL, ET AL.
The Time Course of Grammaticality Judgment
divergence point to the divergence point (in
this case using paired t-tests3). At the next
interval, the Ò20%Ó interval, omissions
(1,224 msec) and transpositions (1,332
msec) switch order from the interval before,
though again the difference is not signiÞcant; both are still signiÞcantly slower than
agreement errors (820 msec). Reaction
times for omissions and transpositions
show a slow drop over the rest of the sentence, until just before the last word. The
pattern of agreement errors being signiÞcantly faster than the other two error types
obtains up to and including the Ò60%Ó interval; after that, the differences between
error types do not reach signiÞcance. Note
that on the last word, the reaction time for
omissions continues to drop, while that for
both agreement and transposition rises,
though statistical analyses including reaction time from the last word could not be
performed due to too many missing observations (recall that by this point most subjects have pressed the ÒungrammaticalÓ
button, and thus almost all reaction times
for this interval have been culled).
To summarize so far, grammaticality
judgments on these three early auxiliary error types result in markedly different binby-bin judgments and reaction times, corresponding to the amount of ambiguity and/or
the number of cues to violation associated
with each violation type, based upon our
experience from Experiment 1. For judgments, responses diverge at the divergence
point with the proÞle agreement > omission
> transposition; at the next interval, agreement has nearly reached asymptote, and
transposition has risen to overtake omission. Both transposition and omission continue to rise over the rest of the sentence,
with transposition reaching asymptote by
the Ò60%Ó interval and omission not reaching it until the last word. For reaction times,
agreement errors are essentially constant
over the course of the sentence (at least until the last word), while both transposition
and omission errors rise signiÞcantly at the
divergence point. This jump appears to continue until the moved element (the Ò20%Ó
interval) for transpositions and then slowly
decrease, while for omissions the decrease
begins immediately after the divergence
point. The correspondences between judgment and reaction time for all three error
types are also interesting: The quickly-resolved agreement errorsÕ relatively constant
reaction time suggests that the perceived divergence point is rather punctate; that is,
subjects either catch the error (and they
usually do), or they miss it entirely. By contrast, the elevated reaction times at and after
the divergence point for omissions and
transpositions make sense in the context of
the more extended decision region that
these two error types evince, as they suggest a Òzone of uncertaintyÓ in which subjects, at any one interval, are hesitant to
commit to an ÒungrammaticalÓ button press
(hence the slower rise of the functions) but
are also uncertain about the possibility of
-27-
BLACKWELL, ET AL.
The Time Course of Grammaticality Judgment
100
percentage button press
percentage button press
100
80
60
40
20
0
0%
20%
40%
60%
80% <100
percent past divergence point
80
60
40
20
0
first
last
Figure 8A. Early determiner omission,
Choice
Figure 8A to Figure 8C display the binby-bin judgments observed with early determiner errors, while Figure 8D presents
the bin-by-bin reaction times for the same
items.
Recall that for both omissions and
transpositions the divergence point is always the Þrst word of the sentence. The
same error type proÞle at the divergence
point as for early auxiliaries is seen, although the overall mean is lower, and all
three error types have signiÞcantly different
rejection rates: agreement (35%) > omission (12%) > transposition (4%). Agreement errors reach 93% by the next interval,
the Ò20%Ó interval, where they are still signiÞcantly higher than the other two error
20% 40% 60% 80% <100
percent past divergence point
last
Figure 8B. Early determiner agreement,
Choice
types (both 61%). By the next interval, the
Ò40%Ó interval (the interval after the
moved element for transpositions), transpositions (93%) have risen to the point where
agreement = transposition > omission
(79%). Omissions slowly rise up over the
course of the sentence, but are still signiÞcantly less than the other two error types up
to and including the Ò80%Ó interval.
100
percentage button press
the sentenceÕs continuing grammatically
(hence the concomitant rise in reaction
times). These results suggest that the binby-bin reaction times can be interpreted as
a kind of conÞdence rating for each judgment.
0%
80
60
40
20
0
0%
20%
40%
60%
80% <100
percent past divergence point
last
Figure 8C. Early determiner transposition, Choice
In general, these results parallel our
Þndings for early auxiliary errors: Agreement errors are resolved rather quickly,
-28-
BLACKWELL, ET AL.
The Time Course of Grammaticality Judgment
bare noun phrase is actually working as a
modiÞer (e.g. ÒBoyÉ George is a very
strange person,Ó). However, the next cue is
apparently not sufÞcient to remove all
doubts about ungrammaticality. Shortly after the Ò20%Ó interval, the rejection rates
for early determiner transpositions move
toward asymptote (at about the Ò40%Ó interval). Omissions take considerably longer, and do not reach the 95% rejection level
until the end of the sentence in many cases.
1500
1250
reaction time (msec)
while omissions take much longer to resolve, and transposition errors fall somewhere in between. However, a comparison
of Figure 7 (early auxiliary errors) to Figure
8 (early determiner errors) suggests some
subtle but interesting differences between
auxiliaries and determiners. First, on auxiliary agreement errors, subjects tended to
make their decision directly at the verb
(e.g., ÒThe boy were * + walking,Ó with a
67% rejection rate by the divergence point),
while they tended to wait for one more
word after the determiner agreement error
occurred (e.g., ÒA girls * were + walking,Ó
with a 35% rejection rate by the divergence
point and a 97% rejection rate at the next interval). This delay may be due to competition from an alternative completion for
early determiner agreement errors (e.g., ÒA
boy[Õ]s lifeÉÓ even though the punctuation
provided on the screen is not compatible
with a completion of that kind). A similar
conclusion applies to the other two early
determiner types. On early determiner
omissions (e.g., ÒBoy * was +ÉÓ) and early determiner transpositions (e.g., ÒBoy *
the + wasÉÓ), subjects do not start rejecting the sentence until after the divergence
point has passedÑuntil at least one more
word (the Ò20%Ó interval). In other words,
they do not judge the error at the divergence
point. In fact, these subjects are right: As
we saw in the Experiment 1, many of the
sentences with early determiner omissions
or transpositions can be salvaged at the divergence point by completions in which the
●
▲
■
●
1000
●
▲
▲
750
●
●
■
▲
▲
■
●
▲
■
■
■
500
▲
●
■
■
250
0
first
0
20%
40%
60%
80%
< 100%
last
percent past divergence point
▲
omission
■
agreement
●
transposition
Figure 8D. Early determiner errors, Reaction time
Figure 8D presents the bin-by-bin reaction times observed for early determiner errors, with all three error types on one graph.
This proÞle matches that for early auxiliary
error reaction times in its gross outlines,
with an elevated reaction time early in the
sentence, a gradual decline in reaction time
over the rest of the sentence (although
transpositions rise again at the Ò60%Ó interval), and with agreement errors notably
faster than the other two types. However,
type only reaches signiÞcance at the Ò20%Ó
interval, with agreement signiÞcantly faster
-29-
BLACKWELL, ET AL.
The Time Course of Grammaticality Judgment
100
percentage button press
percentage button press
100
80
60
40
20
0
first -80% -60% -40% -20% -10% 0 50% last
percent past divergence point
80
60
40
20
0
first -80% -60% -40% -20% -10% 0 50% last
percent past divergence point
Figure 9A. Late auxiliary omission,
Choice
Figure 9B. Late auxiliary agreement,
Choice
than the other two error types. For agreement errors, reaction time at the divergence
point is signiÞcantly slower than at the intervals on either side (using t-tests); for
omission errors, reaction time at the divergence point (for this error type, always the
Þrst word) is signiÞcantly faster than at the
next interval. There were too few data
points at the last word for a meaningful statistical analysis.
time jump at or just after the divergence
point, followed by a slow reaction time
drop over the rest of the sentence. Agreement errors, with a more punctate decision
point, were faster overall, with either no reaction time change at the divergence point
(for auxiliary errors) or with a reaction time
jump directly at the divergence point that
fell again at the next interval (for determiners).
To summarize for early errors, subjects
reject sentences at (for auxiliary errors) or
just after (for determiner errors) the divergence point for agreement errors. Transposition error rejection rates rise more slowly,
with most rejections not occurring until the
moved element has appeared. Omissions
have the slowest rejection rise time, taking
most or all of the sentence to rise to asymptote. Reaction times tended to reßect judgment patterns. The two error types with the
more protracted decision regions, omissions and transpositions, showed a reaction
Let us turn now to the six late-error
types, displayed in Figure 9 and Figure 10.
Figure 9A to Figure 9C present bin-by-bin
judgments associated with late auxiliary errors, while Figure 9D presents complementary information on bin-by-bin reaction
times for these items.
In contrast with the early auxiliaries
(Figure 7), the late auxiliary errors are all
resolved fairly quickly compared to some
early errors, peaking at or shortly after the
divergence point. To some extent this Þnding was inevitable, because these errors are
-30-
BLACKWELL, ET AL.
The Time Course of Grammaticality Judgment
1500
▲
1250
80
reaction time (msec)
percentage button press
100
▲
●
●
1000
60
40
750
■
▲
●
●
▲
■
●
▲
■
●
■
▲
●
▲
■
■
●
■
▲
●
■
■
500
▲
250
20
0
first
0
first -80% -60% -40% -20% -10% 0 50% last
percent past divergence point
▲
Figure 9C. Late auxiliary transposition,
Choice
The three error types diverge at the divergence point, with all three signiÞcantly
different in the order agreement (77%) >
omissions (65%) > transpositions (33%). At
the next interval (which comprises all
words after the divergence point except for
the last word of the sentence), agreement
has risen to 97%, signiÞcantly higher than
either transposition (76%) or omission
(78%). At the last word, all three error types
have risen to between 96% (transpositions)
and 99% (omissions and agreement errors);
the difference is signiÞcant.
Reaction times diverge at the divergence point as well; omissions jump significantly (from 777 msec at Ò<0%Ó to 1290
msec at the divergence point, using a t-test)
as do transpositions (from 755 msec to
1111 msec). Agreement errors dropped signiÞcantly (from 861 msec to 709 msec).
omission
■
agreement
●
<100 % last
transposition
Figure 9D. Late auxiliary errors, Reaction time
Agreement errors are signiÞcantly faster
than the other two error types both at the divergence point and at the next interval. The
further increase in reaction time from the
divergence point to the next interval is signiÞcant for transpositions only. Again,
there were too few data points at the last
word for a meaningful statistical analysis.
100
percentage button press
located at the end of the sentence, where
subjects are forced to make a quick decision.
Ñ80% Ñ60% Ñ40% Ñ20% < 0%
0
percent past divergence point
80
60
40
20
0
first
-80% -60% -40% -20% -10%
percent past divergence point
0
Figure 10A. Late determiner omission,
Choice
Finally, Figure 10A to Figure 10C
present the bin-by-bin judgments observed
on late determiner errors, while comple-
-31-
BLACKWELL, ET AL.
The Time Course of Grammaticality Judgment
100
percentage button press
percentage button press
100
80
60
40
20
0
first
-80% -60% -40% -20% -10%
percent past divergence point
80
60
40
20
0
first
0
Figure 10B. Late determiner agreement,
Choice
By necessity (because the divergence
point so often corresponds to the last word
in the sentence), all of these errors are resolved fairly quickly, compared to some
early errors. At the divergence point, transpositions (which still have one more word
to go, by necessityÑthe moved element)
are signiÞcantly lower (23%) than the other
two error types (90% or better). The increased reaction time at the divergence
point for transpositions is signiÞcant, using
a t-test, (from 724 msec to 1,064 msec) as is
the decreased reaction time for agreement
errors (857 msec to 643 msec).
In summary, late errors by necessity
show fewer between-type differences due
to the small interval between the error and
the end of the sentence. Nevertheless, some
interesting generalizations can be made:
Again, agreement errors stand out in that
they are resolved sooner than the other two
last
Figure 10C. Late determiner transposition, Choice
error types (although this effect may not be
seen very clearly in the late determiner case
due to the confound of the end-of-sentence
effect).
1500
1250
reaction time (msec)
mentary information on bin-by-bin reaction
times is presented in Figure 10D.
-80% -60% -40% -20% -10%
0
percent past divergence point
●
1000
750
■
●
▲
■
●
▲
▲
■
●
●
■
▲
●
▲
■
■
▲
●
▲
■
●
500
250
0
first
Ñ80% Ñ60% Ñ40% Ñ20%
< 0%
0
last
percent past divergence point
▲
omission
■
agreement
●
transposition
Figure 10D. Late determiner errors, Reaction time
Transpositions and omission errors again
show a jump at the divergence point
(though again, for omissions this effect may
not be seen clearly due to the confound of
the end-of-sentence effect), with agreement
errors being faster than the other two error
types at and after the divergence point.
-32-
BLACKWELL, ET AL.
The Time Course of Grammaticality Judgment
The ÒProtracted Decision RegionÓ of
Early Auxiliary OmissionsÑan Artifact?
One possible interpretation of the Òprotracted decision regionÓ especially seen in
early auxiliary omission errors is that it is
not a function of individual subjects making up their minds throughout that region,
but rather that it is an artifact of averaging
over items, or over subjects, or both.
Subject variance: assume that one
subset of subjects, Òearly respondersÓ, usually indicates that the sentence is ungrammatical at or just after the divergence point,
while the other subset, Òlate respondersÓ,
entertain potential grammatical (or semigrammatical) completions until nearly the
end of the sentence. Averaging over cells
might then produce an effect that could be
wrongly interpreted as a protracted decision region for individual subjects (note,
however, that the slow rise of the decision
function over the course of the sentence,
rather than a sudden jump at the divergence
point, a plateau, and then another jump near
the end of the sentence disproves at least the
strong version of this argument).
Item variance: Similarly, subjects
(some or all) might be inclined to always
indicate an ungrammaticality immediately
at the divergence point for some items, and
near the end of the sentence for other items,
again producing the spurious appearance of
what appeared to be an across-item decision region.
To test the Þrst possibility we examined
only early auxiliary omission errors, by
items, and calculated the Òmean words past
divergence pointÓ for each subject; i.e., the
average number of words past the divergence point for each sentence that subjects
Þrst pressed the ÒungrammaticalÓ button. If
subjects fall into two classes, ÒearlyÓ and
ÒlateÓ responders, then individual subjects
should have relatively little variability on
this measure regardless of exactly where in
the sentence they tend to respond. However,
the average standard deviation over subjects in this cell was 2.48, indicating that
even individual subjects varied on where
they believed the sentence had become ungrammatical, for this cell of the design.
To test the second possibility, we performed a mean split on the subjects and
proceeded with those subjects whose standard deviation on this measure was above
the group mean. By eliminating low-variance subjects, we reduced the chance of
creating the appearance of variance on particular items by pooling early responders
and late responders together. We then examined each itemÕs variance, over subjects.
Six of the seven items ranged in standard
deviation from 2.4 to 3.4 (sentence 1 had a
standard deviation of 1.0; when all subjects
were included standard deviation ranged
from 2.0, again for sentence 1, to 3.3).
Thus, there is a fair degree of within-item
variability (with the possible exception of
sentence 1), and therefore the variability of
-33-
BLACKWELL, ET AL.
The Time Course of Grammaticality Judgment
the Òprotracted decision regionÓ is not an
artifact of subject variability or item variability.
Although it seems clear that the apparent Òdecision zoneÓ is not an artifact of collapsing over items or subjects, a third issue
remains: how are reaction times and decisions related within this decision zone, for
individual subjects struggling with an individual item? If subjects are entering into a
protracted phase of indecision, then we
might expect to Þnd that reaction times increase well before the point at which a decision is Þnally made. To investigate this
possibility, we began by locating the point
at which individual subjects switched from
ÒgoodÓ to either Ònot sureÓ or ÒbadÓ for the
Þrst time, for each item. We will call this the
Òzero point.Ó If all decisions are really
punctate (and the protracted decision zone
is an artifact of averaging), then the space
between elevation of reaction time and the
decision to reject a sentence should be very
small, and it should be the same for all item
types. Two patterns are possible under the
artifact interpretation:
1. reaction times do not go up until the
point where decisions change, or
2. reaction times go up at the button
press just before the point where decisions change (as subjects get ready to
move their Þngers from one button to
another), but not before.
Furthermore, this pattern should be the
same for all the major violation types.
If, on the other hand, the major violation types differ in the relation between reaction time increase and button press at the
level of individual items and individual
subjects, then we can conclude with some
conÞdence that these variations in the size
of the decision region are not an artifact of
averaging. To ask this question, we began
by locating the point at which individual
subjects switched from ÒgoodÓ to either
Ònot sureÓ or ÒbadÓ for the Þrst time, for
each item. We will call this the zero point.
We then examined the reaction time for
each of the Þve words prior to that zero
point, which we will call:
•
¥
1 (the word prior to the button press
shift),
2 (the word before -1), -3, -4 and -5,
respectively.
This analysis was conducted on early
items only (both auxiliaries and determiners), and because of variations in sentence
length, the number of items contributing to
each cell is necessarily smaller the farther
back we go (e.g. sentences on which the
zero point occurred on the second word can
only furnish a single -1 point, and no reaction time measurements from -2 through 5).
If, for example, auxiliary omissions
yield a longer decision region than auxiliary transpositions, then the reaction times
from points -2 to -5 should be larger for
auxiliary omissions.
-34-
BLACKWELL, ET AL.
The Time Course of Grammaticality Judgment
1200
1000
J
H
800
J
B
600
400
-5
H
H
J
H
J
H
B
B
B
B
-4
-3
-2
words from first "bad" press
omission
B
agreement
J
J
reaction time (ms)
reaction time (ms)
H
1200
H
J
1000
J
800
600
B
B
400
-1
-5
H
transposition
Figure 11A. Incremental grammaticality
judgement: Reaction times before first
ÒbadÓ press for early auxiliary errors
B
J
H
H
J
BH
J
H
B
-4
-3
-2
words from first "bad" press
omission
B
agreement
J
-1
transposition
Figure 11B. Incremental grammaticality
judgement: Reaction times before first
ÒbadÓ press for early determiner errors
Two separate analyses of variance were
conducted, one for auxiliaries and another
for determiners. Each of these involved a 3
(omission vs. transposition vs. substitution)
by 5 (position -1 to position -5) within-subjects design.
15.64, p < 0.0001, where transposition >
omission > agreement). The difference was
in the predicted direction for position -5 as
well, but this analysis was not reliable (due
perhaps to the small number of items contributing to this cell for each subject).
For early auxiliaries, the analysis yielded no main effect of word position, but a
s i g n i Þc a n t m a i n e ff e c t o f t y p e
(F(2,68)=4.70, p < 0.02) and a signiÞcant
interaction between type and word position
(F(8,188)=3.80, p < 0.00004).
For early determiners, the only signiÞcant effect of the analysis of variance was a
main effect of word position (F(4,87)=3.42,
p < 0.02). However, results were in the
same general direction reported above, with
larger differences between item types at the
earlier positions (see Figure 11B).
The interaction is illustrated in Figure
11A. Post-hoc analyses at each position
showed that the error type difference was
signiÞcant at position -4 (F(2,36)=6.70, p <
0.004, where omission > transposition =
agreement), at position -3 (F(2,58)=6.83, p
< 0.003, where omissions = transposition >
agreement), at position -2 (F(2,68)=4.47, p
< 0.02, where omission = transposition >
agreement), and at position -1 (F(2,68) =
This analysis of the relationship between reaction time and judgment demonstrates that some subjects are sensitive to
some errors well before they are willing to
register their decision, certainly more than
just a word or two. Furthermore, it shows
that the distance between this sensitivity
(manifest in the point in the reaction time
data where reaction time begins to increase)
-35-
BLACKWELL, ET AL.
The Time Course of Grammaticality Judgment
and the eventual decision (manifest in the
button press) is different for different error
types. This is what we mean by a Òprotracted decision regionÓ.
Comparison of Experiments 1 and 2
In Experiment 2, early determiner
omissions and transpositions were still often acceptable to our subjects at the divergence point, suggesting that they still have
a range of possible completions in mind.
However, the reaction time data suggest
that subjects are already doubtful about the
grammatical status of these items. Experiment 1 is consistent with this. For these two
error types, subjects provided a grammatical completion past the divergence point an
average of 66% of the time. Subjects provided a variety of completions at this point,
including use of the bare noun as proper
noun or title (e.g., ÒPresidentÉ Clinton
was briefed by his advisors.Ó), use of noun
in the general sense (e.g., ÒManÉ is a fragile creature.Ó), and use of noun as adjective
(e.g., ÒWomanÉ doctorsÉÓ).
In addition, in Experiment 2 early determiner agreement errors (e.g., ÒA girls
*ÉÓ) appeared to be resolved approximately one word later than early auxiliary agreement errors (e.g., ÒJohn are * ÉÓ), again
consistent with Experiment 1. Subjects in
the cloze experiment provided completions
at the divergence point for 34% of the early
determiner agreement error sentences,
compared with only 3% for the corresponding auxiliary errors. 82.6% of these com-
pletions involved the use of the bare noun
as an adjective, such as, ÒSeveral sailorÉ
uniforms were in my bag,Ó including many
completions where subjects mistakenly
used a plural noun as a possessive, such as,
ÒA boy[Õ]sÉ life is very simple.Ó
Finally, in Experiment 2 early auxiliary
omission errors started to be perceived as ill
formed at the divergence point, but many
subjects were still unwilling to make up
their minds about these error types until the
very last word in the sentence. In between,
there was a long and monotonic drop in acceptability (i.e., a true Òdecision regionÓ),
with substantial variability over individual
subjects and items. Experiment 1 was also
consistent with this, with subjects delaying
their decision on early auxiliary omissions
as they considered a participial interpretation such as, ÒMrs. Brown[,] working at the
library[,] isÉÓ (even though punctuation
did not support this interpretation, and most
of the item types within this cell involve
unique referentsÑproper nouns, pronouns
or other unique individualsÑthat are unlikely candidates for such a participial int e r p r e t a t i o n ) . S u b j e c t s p r ov i d e d
completions at the divergence point for
30% of the sentences of this type. All of
these completions involved either presentparticipial verb phrase completions or gerund + ÒthatÓ clause completions.
The incremental grammaticality judgment (GJ) and cloze procedures yield very
similar results. To quantify this observa-
-36-
BLACKWELL, ET AL.
The Time Course of Grammaticality Judgment
tion, we calculated for the data of Experiment 2 the Òmean words past divergence
pointÓ measure for each item, as done in
Experiment 1. The experiments correlated
signiÞcantly on this by items measure, with
a Pearson correlation coefÞcient of 0.83 (p
< 0.0001). Interestingly, in Experiment 2
the grand mean for Òmean words past divergence pointÓ was higherÑthat is, subjects
also tended to wait longer to give an ÒungrammaticalÓ response in Experiment 2, a
point to which we shall return.
This signiÞcant 0.83 correlation lends
support to the notion that both experiments
are tapping into essentially the same underlying, ongoing structure-building process.
In addition, the proÞle of means in both experiments were almost identical (although
the exact patterns of signiÞcance revealed
by post-hoc tests were somewhat different).
In both experiments, for early auxiliary errors, the order of means was omission >
transposition > agreement; for late auxiliary, transposition > omission > agreement.
In both experiments, for early determiner
errors, agreement errors had the lower
mean, while for late determiner errors,
transposition errors always had the higher
mean (for structural reasons already discussed).
What is going on in this Òdecision regionÓ from the subjectÕs point of view? Are
they conscious of the error at the point
where reaction times start to increase? Are
they postponing a decision until more infor-
mation is available, and all possible completions have been eliminated? Or have
they made their judgment (i.e. they already
ÒknowÓ that the sentence is bad), but have
decided for some reason to postpone a Þnal
button press (like an engaged couple who
are not quite ready to announce their plans
to the family)? The strong correlations that
we have observed between performance in
the Cloze experiment and performance on
the judgment tasks suggests that the subjects are still weighing alternatives. However, the experiments presented here do not
permit us to draw strong inferences about
the phenomenology of grammaticality
judgment, i.e. we do not know what is going on in our subjectsÕ minds, before, during or after the proposed Òdecision regionÓ.
It is possible that the distinction between
sensitivity to error (a perceptual event) and
the decision to push a button (a form of motor planning) could be disentangled with
another methodology (e.g. event-related
brain potentials). For present purposes,
however, we can conclude with some conÞdence that grammatical violations differ in
the amount of time required to register a decision.
Summary of results for Experiment 2
Experiment 2 has yielded a great deal of
information about the time course of grammaticality judgment, much of it consistent
with Experiment 1, and summarized brießy
as follows:
1. Accuracy. Overall, end-of-sentence ac-
-37-
BLACKWELL, ET AL.
The Time Course of Grammaticality Judgment
curacy was very high in this experiment, averaging around 95% correct
rejections for ungrammatical sentences
and 95% correct acceptances for their
grammatical controls. An analysis of
variance A' scores (corrected for response bias) yielded very few differences among the various error types,
although performance was slightly
worse overall for determiner omissions.
the latter Þnding is probably due to
the uninteresting fact that subjects are
forced to make up their minds by the
presence of a period signaling the end
of the sentence. For all the remaining
violation types, we have to abandon
the punctate view in favor of something that is best described as a Òdecision region.Ó This conclusion is
forced by the following facts:
a. Early determiner agreement errors (e.g., ÒA girls *ÉÓ) appear
to be resolved approximately
one word later than early auxiliary agreement errors (e.g., ÒJohn
are * ÉÓ). To explain this difference, we noted that in Experiment 1 subjects provided
completions such as, ÒA boy[Õ]s
life is very simpleÓÑdespite the
fact that such completions
should be ruled out by the absence of an apostrophe to signal
a possessive reading.
b. Early auxiliary omission errors
start to be perceived as ill formed
at the divergence point, but many
subjects are still unwilling to
make up their minds about these
error types until the very last
word in the sentence. In between, there is a long and monotonic drop in acceptability (i.e., a
true Òdecision regionÓ), with
substantial variability over individual subjects and items. This is
also consistent with Experiment
1, where subjects delayed their
decision on early auxiliary omissions because they were still
considering a participial interpretation such as, ÒMrs.
Brown[,] working at the libraryÉÓ However, most of the
item types within this cell involve unique referents (proper
2. Relationship between judgments
and reaction times. There were striking parallels between the decision and
reaction time data, suggesting that the
word-by-word reaction times obtained with this paradigm can be
viewed as an indirect index of the degree of conÞdence associated with
grammaticality judgments at each
point in the sentence, as well as, perhaps, a decision process in which subjects attempt to generate alternatives.
In general, both sources of information (word-by-word decisions and reaction times) offer useful and
complementary information about the
time course of grammaticality judgment.
3. Size and shape of the decision function. The twelve relatively simple error types that we have manipulated
here are associated with markedly
different decision functions, with a
signiÞcant correlation between the
cloze and incremental GJ techniques.
For some error types, it seems fair to
conclude that there is a single decision point, located at or close to our
predetermined divergence point. This
is true for early auxiliary agreement
errors, and it is true for most errors located late in the sentenceÑalthough
-38-
BLACKWELL, ET AL.
nouns, pronouns or other unique
individuals) that are unlikely
candidates for such a participial
interpretation (see Appendix I).
Such interpretations would be
possible with a different form of
punctuation (e.g., a non-restrictive clause such as ÒMrs. Brown,
working at the library, called
home to sayÉÓ). But no such
punctuation was provided in this
experiment. Perhaps our subjects
delay their decisions on early
auxiliary omission items because of their partial overlap
with or resemblance to participial constructions. In addition,
these issues could be resolved by
further studies systematically
varying the number, frequency
and degree of plausibility of
competing sentence completionsÑa point to which we shall
return later.
c. Early auxiliary transposition errors are resolved in at least two
steps: Rejection rates start to go
up at the divergence point (where
omissions and transpositions are
still equivalent), with a sharp increase at the next word (the displaced auxiliary, which serves as
a second cue). Still, these errors
do not reach asymptote until
about 60% past the divergence
point (i.e., roughly six words after the divergence point), sug-
The Time Course of Grammaticality Judgment
gesting that many subjects are
unwilling to make up their minds
until the end of the sentence. A
similar second-cue effect is observed on late auxiliary errors,
although these items are then
forced to asymptote by punctuation signaling the end of the sentence.
d. Early determiner omissions and
transpositions are still acceptable to our subjects at the divergence point, consistent with the
range of completions subjects
provided in Experiment 1 (e.g.,
ÒBoy GeorgeÉÓ). However, the
reaction time data suggest that
subjects are already doubtful
about the grammatical status of
these items. For approximately
half the subjects (and/or half the
items), this suspicion is conÞrmed by the next word. Nevertheless, judgments and reaction
times associated with these early
determiner items do not reach asymptote until 40% past the divergence point.
It seems fair to conclude that grammaticality judgment is a matter of degree, a
protracted and variable process. To what
extent is this result an artifact of the wordby-word judgment task itself? To answer
this question, we proceed to our third and Þnal experiment.
-39-
BLACKWELL, ET AL.
The Time Course of Grammaticality Judgment
EXPERIMENT 3: Rapid Serial Visual
Presentation
In order to prove that our results are not
an artifact of incremental presentation and
judgments (which certainly are further
from the processing that occurs in real life),
the third experiment tests subjects with the
same stimuli using the rapid serial visual
presentation (RSVP) paradigm.5
Method
Subjects. Subjects were thirty-two
UCSD students who completed the experiment either for course credit or for a $7 payment. One subject was dropped from
subsequent analyses for having AÕ scores
more than 2.5 standard deviations from the
mean (see below). Of the thirty-one remaining subjects, twenty-Þve were male and two
were left-handed. All subjects were native
speakers of English.
Stimuli. The materials were the same as
those used in Experiments 1 and 2.
Equipment. The experiment was conducted on an IBM PC-XT, using a GoldStar
1210A amber screen monitor and a Carnegie-Mellon University button box, accurate to one millisecond. Stimuli appeared at
the center of the screen, one word at a time.
Procedure. Subjects Þrst practiced using the button box for twenty trials. First,
the word ÒREADYÓ appeared at the bottom
center of the screen. Then, the single word
ÒCorrectÓ or ÒWrongÓ appeared in the center of the screen, for 350 msec. Subjects
were told to push the corresponding button
as fast as possible. In contrast, with Experiment 2, only two buttons were used in this
experiment (i.e., no Ònot sureÓ option was
provided). Reaction times were recorded.
This task provided practice with the button
box, together with a baseline reaction time
for each subject.
After practice with the button box, subjects were given an opportunity to practice
the judgment procedure. During the sentence practice session, subjects received
twenty trials. The practice sentences were
comparable in length, structure, and error
type to the actual data set, but did not overlap with this data set.
Both the button pressed (Ò GOOD Ó or
ÒBAD Ó) and the reaction time (in msec)
were recorded at each trial. For ungrammatical sentences, reaction time was measured
from the same divergence point as Experiments 1 and 2. For grammatical sentences,
reaction time was measured from sentence
onset.
A trial consisted of the following:
1. The screen was clear for 500 msec.
2. The word ÒREADYÓ appeared near the
bottom center of the screen, for 1000
msec.
3. The screen cleared, and a 2000-msec
pause followed.
4. The sentence appeared in the middle
center of the screen, one word at a
time. Each word appeared for 350
msec, without a pause between
-40-
BLACKWELL, ET AL.
The Time Course of Grammaticality Judgment
words.
5. As soon as subjects had made the
grammaticality judgmentÑeven if
the sentence was still runningÑthey
were to press the appropriate button.
6. At the end of the sentence, the screen
was blank for 3000 msec, during
which time the program would still
accept a button press.
7. This constituted the end of a trial. The
following trial then began, with another 500 msec pause and ÒREADYÓ
cue.
The experimenter instructed subjects to
read the sentences carefully as they appeared on the screen, and to press the button
as quickly as possible after making their decision, even if the sentence was still running. Subjects were instructed to focus on
what they considered proper grammar, and
not on ideal style, punctuation, or spelling,
which were always correct.
The actual experiment consisted of 168
trials of the sentence stimuli described
above. Each subject received the sentences
in a different random order, determined by
the controlling computer program. Subjects
were told that they would receive a break at
the mid-point of the experiment (after trial
84). At this point, instead of the ÒREADYÓ
cue, the subject received a ÒPLEASE WAITÓ
cue.
Data analyses. Two dependent measures were used: AÕ (see above), and reaction time. Reaction times were used only
for correctly answered ungrammatical core
stimuli. These reaction times were measured from the divergence point. As described in Experiments 1 and 2, the
divergence point corresponds to the Þrst
point at which there was a divergence between ungrammatical sentences and their
grammatical controls. Although as the
cloze experiment has shown us there are
still a variety of ways that some of the sentence types might be saved beyond this
point (particularly true if the subjects are
willing to ignore punctuation), this is the
Þrst point at which the error types manipulated in this experiment could conceivably
be detected. Omission, transposition and
agreement errors all share the same divergence point (i.e., they all diverge from
grammatical controls on the same word).
Results and Discussion
Overall accuracy for grammatical and
ungrammatical sentences
All of the analyses which follow are
based upon the thirty-one subjects who remained after the one outlier subject was
dropped. Performance on individual sentences was examined to determine whether
any of the sentences were outliers (deÞned
as an accuracy level more than 2.5 standard
deviations below the mean). As with Experiments 1 and 2, sentence #8.11 (in Appendix I.C) met this criterion, classiÞed as
ungrammatical by only 21% of subjects. So
did sentence #5.1, classiÞed as ungrammatical by only 50% of subjects. These two
sentences are dropped from all further anal-
-41-
BLACKWELL, ET AL.
The Time Course of Grammaticality Judgment
Accuracy levels were roughly similar to
those of Experiment 2, suggesting that the
added pressure to respond quickly did not
result in an increased level of error. Subjects were at 90.8% on grammatical sentences and 93.0% on ungrammatical
sentences, which corresponds to an AÕ of
95.6.
An analysis of variance was conducted
on the AÕ scores, treating subjects as a random variable within a 2 ´ 2 ´ 3 within-subjects design. Location of error (early vs.
late), part of speech (auxiliary vs. determiner) and violation type (omission, agreement, transposition) were the factors. This
analysis yielded two signiÞcant main effects, and a signiÞcant interaction: a main
effect of error type (F(2,60) = 4.93, p <
0.011; by items, F(2,110) = 3.69, p < 0.003)
and a main effect of part of speech (F(1,30)
= 5.76, p< 0.0228; by items, F(1,110) =
7.99, p < 0.056). The interaction was part of
speech ´ type (F(2,60) = 3.21, p < 0.047; by
items F(2,110) = n.s). In addition, location
approached signiÞcance (F(1,30) = 3.88, p
< 0.0581; by items, F(1,110) = 16.85, p <
0.0001).
The main effect of part of speech was
due to subjects being more accurate with
auxiliary errors (AÕ=96.5) compared to determiners (AÕ=95.0). The main effect of error type was explored using standard
planned comparisons. Because of our a pri-
ori predictions about Òerror typeÓ difference s , t h e l e s s c o n s e r va t i v e p l a n n e d
comparisons were used to investigate the
main effect of error type over subjects. All
other post-ANOVA analyses use the more
conservative Newman-Keuls test at p <
0.05.
Subjects were signiÞcantly more accurate at detecting transposition errors (mean
A' = 96.6) than they were at detecting errors
of omission (mean A' = 94.8), with agreement in between (mean A' = 95.9) and not
signiÞcantly different from either (this effect also held by items, using NewmanKeuls). This result can be summarized as
transposition > omission (though note our
comparisons of error type at each of the two
levels of part of speech, below).
100
98
A-prime
yses, and the following AÕ scores were calculated with these sentences removed.
96
94
B
auxiliary verbs
O
determiners
B
O
B
O
O
92
90
B
the interaction is significant
omission
agreement
error type
transposition
Figure 12. Visual grammaticality judgement experiment: AÕ by error type and
part of speech
Post-hoc tests were used to explore the
signiÞcant part of speech ´ type interaction
(see Figure 12). Analyzing by part of
speech revealed no signiÞcant effects of
-42-
BLACKWELL, ET AL.
The Time Course of Grammaticality Judgment
> transpositions (1612 msec) > agreement
errors (1110 msec). This proÞle was identically signiÞcant over items. There was also
one signiÞcant interaction: violation type ´
part of speech (F1(2,60) = 3.71, p < 0.031;
F2(2,35) = n.s.).
Visual grammaticality judgement experiment
reaction time by error type, for early errors only
reaction time (msec)
type for auxiliaries, with determiner omissions (mean AÕ = 93.3) signiÞcantly less accurate than determiner transpositions
(mean AÕ = 96.4), over subjects only. Determiner agreement errors were in between
(mean AÕ = 95.3) and not signiÞcantly different from either. A post-hoc analysis by
type of error showed the part of speech difference to be signiÞcant only for omission
errors, with auxiliary omission errors
(mean AÕ = 96.2) signiÞcantly more accurate than determiner omission errors (mean
AÕ = 93.3).
1800
B
O
O
B
1600
1400
O
B
1200
1000
omission
agreement
transposition
error type
early
Reaction times
B
auxiliary verbs
Two analyses of variance were conducted to evaluate reaction times from the divergence point: an analysis over subjects, and
an analysis by items. In addition, analyses
of early and late errors were carried out separately, as in Experiments 1 and 2, for a total of four analyses. The subject analyses
followed the same 2 ´ 3 design, with part of
speech (auxiliary vs. determiner) and violation type (omission, agreement, transposition) serving as within-subject variables.
The item analyses followed a 2 ´ 3 design,
with part of speech and violation type both
between-subjects variables.
O
determiners
For early errors, the analysis over subjects yielded one signiÞcant main effect, for
violation type (F1(2,60) = 58.47, p <
0.0001; F2(2,35) = 29.45, p < 0.0001).
Planned comparisons of the main effect of
type (see Figure 13) showed that this was
due to the pattern omissions > (1777 msec)
Figure 13. Visual grammaticality judgement experiment: reaction time by error
type for early errors only
Post-hoc tests were used to explore the
signiÞcant interaction (see Figure 13). For
the violation type by part-of-speech interaction, for auxiliary verb errors, omissions
(1836 msec) were slower than transpositions (1562 msec), which were slower than
agreement errors (1044 msec). For determiner errors, there was no signiÞcant difference between omission (1719 msec) and
transposition (1663 msec) errors, which
were both signiÞcantly slower than agreement errors (1175 msec). Comparing auxiliary and determiner errors by type of error,
the difference between the two was signiÞcant only for agreement errors, with auxiliary agreement errors (1044 msec) faster
-43-
BLACKWELL, ET AL.
The Time Course of Grammaticality Judgment
then determiner agreement errors (1175
msec).
Turning now to an analysis of the late
errors, there were two main effects, one for
part of speech (F1(1,30) = 13.19, p < 0.001,
F2(2,35) = 5.44, p < 0.03; auxiliary verbs
were at 1052 msec and determiners at 937
msec), and one for violation type (F1(2,60)
= 21.45, p < 0.0001; F2(2,35) = 12.44 p <
0.0001). Planned comparisons of the type
effect showed the effect transpositions
(1160 msec) > omissions (911 msec) =
agreement errors (915 msec). This proÞle
was identically signiÞcant over items.
Reaction times are relatively fast for all
late violations, although post-hoc tests indicate that late transposition errors are still
signiÞcantly slower than the other two lateerror types, which do not differ signiÞcantly from one another. As noted with the earlier experiments, these differences probably
reßect that sentences with a late transposition error tend to be one word longer after
the divergence point than the other late error types, giving the subjects just a little
longer to make up their minds if they are so
inclined. The pattern of responses differs
for early and late errors: for early errors the
reaction time trend can be summarized as
omission > transposition > agreement. For
late errors the trend can be summarized as
transposition > omission = agreement.
It is hopefully clear by now that there
are marked differences in the pattern of results obtained for accuracy vs. reaction time
in this experiment. To examine the nature of
the relationship between speed and accuracy in more detail, we calculated the Pearson
correlation coefÞcient between AÕ scores
and average reaction time across all subjects. This analysis yielded a non-signiÞcant correlation of +0.06. We may conclude
that there is little evidence for a speed/accuracy trade-off over subjects (i.e., it is not the
case that some subjects were sloppier than
others, rushing through the experiment).
Next we calculated the speed/accuracy correlation across all 82 ungrammatical targets
(excluding the two outliers). On this analysis, accuracy was deÞned as percent correct
rejection (recall that A' is not a property of
individual items). The resulting correlation
was positive and signiÞcant at +0.42 (p <
0.001). In other words, there is a speed/accuracy trade-off at the individual-item level. Some items take a longer time to resolve
because subjects are being particularly
careful; other items are resolved quickly,
but they also result in more false negatives
(i.e., they are incorrectly accepted as grammatical).
Correlations and partial correlations
amongst the three experiments
Despite the complexity of these Þndings, one conclusion is very clear: In almost all respects, the reaction time
results obtained with this RSVP technique
parallel results from Experiments 1 and
2 on the size of the decision region that is
observed with word-by-word judgments
of grammaticality. To quantify this intu-
-44-
BLACKWELL, ET AL.
The Time Course of Grammaticality Judgment
ition, we calculated two Pearson correlation
coefÞcients for all 82 ungrammatical targets (with the two outliers removed), comparing the mean reaction time obtained in
Experiment 3 with Experiment 1 and Experiment 2Õs mean words past divergence
pointÑthe average number of words past
the divergence point for each sentence that
subjects Þrst produced an ÒungrammaticalÓ
response. All outlying items were removed
before the correlation was run (deÞned as
any item with a score more than 2.5 standard deviations from the mean). For Experiments 1 and 3, this analysis yielded a
correlation of 0.75 (p < 0.0001). For Experiments 2 and 3, this analysis yielded a correlation of +0.91 (p < 0.0001; not
surprising, since, as reported above, Experiments 1 and 2 also correlated signiÞcantl y ) , w h i c h c o n Þr m s t h a t t h e s e t wo
techniques yield very similar results when
they are applied to the same sentence stimuli. When the same correlation was run separately for early and late errors, to discount
some of the variance caused by late errors
having the advantage of end-of-sentence
cues, the correlation was still high; for early
errors, r = +0.87 (p < 0.0001); for late errors, r = +0.83 (p < 0.0001).6
In addition, we also performed partial
correlations to determine whether the two
word-by-word methods make independent
predictions of the reaction times observed
in Experiment 3. When variance from the
cloze experiment is removed on Step 1, the
partial correlation between Experiment 2
(incremental grammaticality judgment
(GJ)) and Experiment 3 (RSVP) on Step 2 is
0.77, indicating that even after all of the
predictive information offered by the cloze
experiment is accounted for, the incremental GJ experiment still offers additional information about the RSVP experiment.
When the contribution of incremental GJ is
removed on Step 1, the partial correlation
between the Experiment 1 (cloze) and Experiment 3 on Step 2 is 0.02, indicating that
after all of the predictive information offered by the incremental GJ experiment is
accounted for, the cloze experiment offers
(essentially) no additional information
about the RSVP experiment. Thus, both the
cloze and incremental GJ experiments predict reaction time in the RSVP experiment;
however, the incremental GJ experiment is
a better predictor, and completely overlaps
the predictive information offered by the
cloze experiment. Although we cannot be
certain why incremental GJ provides a better Þt to Òone shotÓ, on-line judgments, we
offer two possible reasons. First, Experiment 1 has heavy task demands. Subjects
tend to provide a Òcan't completeÓ response
in the cloze experiment sooner than they
provide an ÒungrammaticalÓ response in
the incremental GJ experiment, perhaps because they Þnd it tiring to generate a grammatical completion on each word, and want
to get each stimulus (and the experiment as
a whole) over with as fast they can without
violating the rules of the game. Second, in
-45-
BLACKWELL, ET AL.
The Time Course of Grammaticality Judgment
the cloze task subjects can provide only one
response at each word. Therefore, for any
one subject sensitivity to, e.g., the number
of potential completions still possible is
lost.
These results also suggest that our
choice of the divergence point for each sentence type is crucial in determining the outc o m e t h a t i s o b s e r ve d w i t h e i t h e r
procedure. At the end of Experiment 2, we
concluded that many error types have no
identiÞable Òdecision pointÓ. Instead, they
are resolved across an extended Òdecision
regionÓ, marked by ample variation over
subjects and items. This was also seen in
Experiment 1. And yet, by deÞnition the
technique requires us to assign a single point in time from which all reaction
times are measured, a point to which we return in the Þnal discussion.
RSVP
Summary of results for Experiment 3
The results of Experiment 3 are complementary in many respects to the results
observed in Experiments 1 and 2:
1. Accuracy. Overall accuracy levels
were very high on Experiment 3, averaging
around 93% correct rejections for ungrammatical stimuli and 91% correct acceptances for grammatical controls. An analysis
of variance on AÕ scores (which corrects for
response bias) suggests that accuracy levels
are higher overall for transposition errors.
The type ´ part of speech interaction suggests that the most vulnerable items (i.e.,
the violations that are most often missed)
are those that involve determiner omissions. The apparent disadvantage for determiner omissions was also found in
Experiment 2, in the Þnal button press measure. Hence the relative vulnerability of determiner omissions appears to be a robust
Þnding.
2. Reaction times. This analysis yielded an array of complex interactions involving location, part of speech and error type.
In general, the fastest reaction times come
from early violations of agreement and late
violations of omission. The slowest reaction times and the largest decision regions
come from early auxiliary omissions. Despite their apparent complexity, these reaction time results are quite compatible with
results from Experiment 2 on the size and
shape of the decision region for each item
type. Indeed, these two indices were significantly correlated (+0.91), suggesting that
the reaction time results obtained in Experiment 3 are a direct reßection of the size of
the decision region for each item type.
-46-
BLACKWELL, ET AL.
The Time Course of Grammaticality Judgment
CONCLUSION
The purpose of this study was to investigate the time course of grammaticality
judgment as a performance domain, applying three different techniques to obtain convergent data: cloze completion, incremental
GJ, and judgments of well-formedness. Our
results include the Þnding that some error
types are associated with a clear-cut Òdecision point,Ó while others are best described
in terms of a protracted Òdecision regionÓ
with ample variability by items and subjects.
The traditional use of reaction time
techniques in cognitive psychology and
psycholinguistics has been to ascertain the
type and number of putative processes involved in some cognitive operation. For diachronic stimuli, this overlooks an
important additional and potentially confounding source of variance: the point at
which relevant information becomes available. For example, if active declarative sentences are processed faster than passive
negative sentences, measured by reaction
time on some task using the sentence, it
could be because the active sentence requires fewer transformations between deep
and surface structure or it could be because
the information needed to successfully
complete the experimental task is available
sooner in the case of active sentences.
By their very nature, then, reaction time
techniques require us to impose two points
on what appears to be a continuous land-
scape: the point from which reaction times
are measured, and the point at which the behavior in question takes place. This methodological fact has serious theoretical
consequences. Obviously, the pattern of reaction times that we observe is entirely determined by the point at which we decide to
start the clock. But, at least for some error
types, we have seen that where we start the
clock is an uncertain thing. If, for example,
we were to use the results of Experiment 2
to design empirical Òdivergence points,Ó
what threshold should we use to indicate
where the ÒpointÓ isÑthe 50% rejection
threshold, the 75% rejection threshold, the
90% threshold? This is further complicated
by the variability in the size of the decision
region between error types. In other words,
the problem is not only that there are differences in absolute reaction times depending
upon the threshold point used, but also that
different choices of threshold will change
the rank order amongst the different error
types. For example, measuring reaction
time in Experiment 3 from the divergence
point (an early threshold) resulted in the
pattern omission > agreement. However,
measuring reaction time from a late threshold, such as the 75% threshold, would provide the exact opposite proÞle, agreement >
omission, for agreement errors, with their
quick resolution in Experiment 2, would
have a 75% threshold at essentially the
same point as the divergence point, while
omission errors, with their protracted decision region, would have a 75% threshold,
-47-
BLACKWELL, ET AL.
The Time Course of Grammaticality Judgment
and a corresponding point from which reaction times were measured, much later.
What are we to do with this insight?
One alternative might be to abandon punctate reaction time techniques altogether, in
favor of methods that provide a more faithful picture of the continuous and probabilistic process that underlies judgments of
well-formedness. This might include selfpaced word-by-word reading, eye movement monitoring, event-related brain potentials, and/or the incremental GJ paradigm
used here. Unfortunately, most of these leftto-right methods are costly, and all of them
are very time-consuming, generating a
large number of data points that are difÞcult
to explore within a standard experimental
design. However, one strength of the techniques used in the Þrst two experiments of
this paper is that the guesses or word completions that subjects provide at each word
provide useful information about the number and range of alternatives that these subjects still have in mind. These completions
provide a relatively faithful reßection of the
competing alternatives from the subjectÕs
point of view.
Our results have shown that for some
error types the divergence point is not necessarily at the same place that the experim e n t e r b e l i ev e s i t t o b e Ñ i n d e e d ,
sometimes there is no one point at all. Our
results have implications for a promising
new research area in psycholinguistics; i.e.,
the use of event-related potentials (ERP) as
an index of sensitivity to semantic or syntactic violations (Hagoort, Brown &
Groothusen, 1993; Neville, Nicol, Barss,
Forster & Garrett, 1991; Osterhout & Holcomb, 1993; Brown, Hagoort, & Vonk,
1995). At the very least, our results suggest
that all future ERP studies (1) pre-test materials using one or a number of the techniques we have uses here in order to
empirically determine the divergence point
(if there is one); (2) investigate ERPs over
the entire course of the sentences, rather
than just at the divergence point (e.g., King
& Kutas, 1995).
For example, Neville et al. (1991)
showed different (though not completely
orthogonal) waveforms for semantic anomalies and three different types of syntactic
anomalies, using this Þnding as support for
the biological reality of semantic vs. syntactic processes as well as for the three different (Government-and-Binding-theorymotivated) syntactic violation types. In
some conditions, the divergence points
were quite punctate, like our agreement errors (e.g., phrase-structure violations such
as, ÒThe scientist criticized MaxÕs of proof
the theorem,Ó) while others had divergence
points that were less certain, like our omission and transposition errors (e.g., subjacency violations such as, ÒWhat was a
proof of criticized by the scientist?Ó). The
data reported ERPs only at the particular
word that the experimenters considered to
be the divergence point; it is possible (and
-48-
BLACKWELL, ET AL.
The Time Course of Grammaticality Judgment
suggested by our research) that (1) subjects
are showing effects of the formedness manipulations at points other than the divergence point and that (2) the waveform
differences may be due to the same factorsÑexpectancy and potential alternate
completions at the particular pointÑwhich
we believe to be affecting the dependent
variables in our experiments. For example,
in the Neville experiment subjectsÕ overt
judgment of grammaticality was not uniform across conditions, ranging from 72%
correct detection of WH-movement violations to 98% correct detection of phrasestructure violations (although they do not
provide signiÞcance levels) As our experiments have shown, differences in potential
alternate completions absent any theoryspeciÞc difference between sentences can
have effects both on the size of the decision
region as well as on the subjectÕs Þnal decision of grammaticality. The differences in
judgment of grammaticality in the Neville
experiment suggest that this may indeed be
at least part of what is happening, and therefore that at least part of the difference in
waveforms may be attributable to the effects of expectancy and potential alternate
completions.
An ERP study by Hagoort, Brown, and
Groothusen (Hagoort et al., 1993) was even
more suggestive that a punctate divergence
point can not always be assumed in such
studies. The authors used both procedures
that we have suggested here, pre-testing
their materials (though in a serial visual
presentation task and not in an incremental
GJ task) and measuring and reporting on
ERPs throughout the sentence. The pretesting revealed effects similar to those of
our experiment: For some error types subjects responded mostly at the divergence
point, while for others responses were more
frequent after the divergence point. Waveform differences between ungrammatical
and control grammatical sentences revealed
signiÞcant differences at the divergence
point, after the divergence point, (including
at the sentence-Þnal position, reminiscent
of our own sentence-Þnal elevated-reaction
time effects) and in some cases before the
divergence point, supporting our contention
that one cannot assume, without empirical
support, that subjects invariably perceive an
ungrammaticality at a particular point.
Implications for aphasia. We chose to
study this particular set of violations for
two reasons: (1) to determine whether the
pattern of errors observed in speech production by aphasic patients can be explained by variations in the degree of
sensitivity displayed by normal listeners
exposed to the same error types, and (2) to
start our on-line investigations of error detection in normals with a well-deÞned set of
minimal contrasts over materials that are
comparable in every other respect. With regard to the Þrst rationale, we have uncovered new information about the processing
characteristics that may make some errors
-49-
BLACKWELL, ET AL.
The Time Course of Grammaticality Judgment
more vulnerable (i.e., harder to detect) than
others, which may in turn help to explain
why aphasic patients are more prone to produce those error types. With regard to the
second rationale, it is now clear (as we suspected from the outset) that these supposed
Òminimal contrastsÓ are really not minimal
at all, because these error types (i.e., agreement, omission and transposition) differ
markedly in the range of alternatives that
are kept open at various points from the divergence point to the end of the sentence.
Starting with the Þrst rationale, it has
been known for some time that aphasic patients tend to produce errors involving
grammatical function wordsÑalthough the
nature of those errors may vary across different types of aphasia (i.e., more errors of
function word omission in non-ßuent patients; more errors of agreement, coupled
with a tendency toward overuse of function
words in the Òempty speechÓ of some ßuent
patientsÑfor a review, see Bates & Wulfeck, 1989a). Furthermore, some function
word errors are very frequent (i.e., agreement errors and omissions), while other are
relatively rare (i.e., transposition errors).
Building on earlier work in the auditory
modality by Wulfeck and her colleagues
(Wulfeck & Bates, 1991; Wulfeck et al.,
1991; Wulfeck, 1987), we hypothesized
that a similar gradient of sensitivity to error
types may be found in grammaticality judgments by normal subjects (i.e., less sensitivity to errors of omission and/or agreement;
more sensitivity to transposition errors). If
this proved to be the case, it would provide
support for the idea that aphasic patients
suffer from deÞcits that affect or interact
with the process by which normal subjects
monitor for errors in their own speech and
the speech of others. Experiment 3 provided some support for this view. Although accuracy levels were very high overall, they
were generally higher for errors of transposition and lower for omission errors, with
agreement errors in between. Hence errors
that are rare in aphasia seem to be easy for
normals to detect, and errors that are common in aphasia tend to be harder to detect.
There are a number of possible explanations for these error type differences. First,
normals and aphasic patients may display
greater sensitivity to transposition errors
because these errors always involve at least
two cues, the ÒholeÓ (i.e., the point at which
subjects realize that an omission may have
occurred) and the displaced element (i.e.,
the moved element is encountered at an unexpected point).
Second, the advantage of transpositions
over omissions might be explained by the
number of bigrams violated in each error
type. If 2.1 is a grammatical string, 2.2 is a
transposition error on that string and 2.3 an
omission error. The transposition error has
three illegal bigrams, ÒACÓ, ÒCBÓ, and
ÒBDÓ, while the omission error has only
one, ÒACÓ.
-50-
BLACKWELL, ET AL.
The Time Course of Grammaticality Judgment
2.1 A B C D E
2.2 A C B D E
2.3 A C D E
Third, we have seen that omissions and
transposition errors both yield a Òdecision
regionÓ that varies in length depending on a
number of factors, while agreement errors
are usually resolved within a short time (often corresponding to a single point). It is
possible that the lengthy decision process
reßected in resolution of omission and
transposition errors is experienced subjectively (albeit unconsciously) as a long period of perturbation. By contrast, the period
of uncertainty associated with agreement
errors tends to be relatively short (assuming
that the subject detects this error in the Þrst
place). If grammaticality judgment is (as
we have proposed) a close relative of the
monitoring processes used in language production and comprehension, then we may
speculate that long periods of uncertainty
are more likely to bring the error above
thresholds of attention. That is, Òbig perturbationsÓ (accompanied by a larger array of
alternative completions) may result in better error detection than Òsmall perturbationsÓ. This result is consistent with the
reaction time differences between error
types at the divergence point in Experiment
2, where transpositions and omissions
showed a reaction time jump while agreement errors remained faster and relatively
constant across the course of the sentence.
Thus, aphasic patients (like normal con-
trols) may be vulnerable to agreement errors because the perturbations produced by
such errors are harder to detect. In regard to
the greater vulnerability of omission errors
as compared to transpositions, Elman (personal communication) reports that simple
recurrent nets (SRNs) trained to anticipate
temporally ordered stimuli also tend to be
more sensitive to transposition errors than
to omission errors (though he cautions that
these Þndings may not be intrinsic to SRNs
but may be dependent upon the particular
tasks that he has trained them upon.) When
such networks have learned a simple grammar, they are more able to continue successfully with the prediction task (i.e.,
recover) when the error is an omission rather than a transposition error. This proÞle, as
Elman points out, indicates a sensitivity to
relative rather than absolute order. The
omission error of 2.2 has three elements in
the wrong absolute position, ÒCÓ, ÒDÓ, and
ÒEÓ, while the transposition error of 2.3 has
only two elements in the wrong position,
ÒCÓ and ÒBÓ. If these networks (and, by extension, our subjects) were sensitive to absolute order, one would expect the opposite
proÞle of sensitivity, with omissions better
detected than transpositions. Should we
continue to Þnd that humans and networks
show similar task proÞles on these sorts of
well-formedness judgments, this is good
evidence that such models organize and
process information in a manner analogous
to humans.
-51-
BLACKWELL, ET AL.
The Time Course of Grammaticality Judgment
Given these three possible explanations, why, then, are omission errors relatively common in aphasia (especially nonßuent aphasia)? One possibility is that the
two common error types (agreement and
omission) have a different causal base.
Agreement errors are Òreal missesÓ, observed in most often in ßuent patients because these patients suffer from what can be
characterized as a Òspeed/accuracy tradeoffÓ (Bates, Appelbaum & Allard, 1991;
Bates & Wulfeck, 1989b; Kolk, 1985; Kolk
& Heeschen, 1985; Haarmann & Kolk,
1992). By contrast, omission errors may occur more often in patients who are all too
aware of their limitations, patients who produce an omission error (often with complete awareness) in order to get around
painful output limitations. This is, as noted,
pure speculation at this pointÑbut it is a
possibility worth pursuing.
Aside from their implications for neurolinguistic research, our materials were
chosen to reßect a set of minimal contrasts.
It is now clear that these three error types
yield markedly different performance proÞles despite their superÞcial similarities.
Agreement (or substitution), omission and
transposition errors are often compared and
analyzed together in aphasia research because they do appear to form a natural contrast set (e.g., Miceli, Silveri, Romani &
Caramazza, 1989). However, in all three of
our experiments we have found striking differences in the size and shape of the deci-
sion and reaction time functions associated
with these error types (omission, agreement, transposition), and with variations in
part of speech (auxiliary vs. determiner)
and location (early vs. late). From the alternative completions that subjects offered in
Experiment 1 (the cloze procedure), we
may infer that the critical differences
among these stimuli lie in the number and
range of alternative completions that the
subjects are still willing to entertain. In addition to the well-formed completions that
naive subjects provided in the cloze task,
we also found completions that ought to be
ruled out if subjects were following the
rules of their language in a strict fashion
(e.g., a restrictive relative clause interpretation should not be possible after a proper
noun; a non-restrictive relative clause must
be set off by punctuation). In other words,
some subjects appear to hesitate in classifying a sentence as ungrammatical because of
a partial overlap between ungrammatical
stimuli and legal alternatives in the language. Competing alternatives may die
away slowly; they are not necessarily eliminated in a stepwise fashion, and they may
hang around to cause trouble even though
they do not provide a discrete Òyes/noÓ Þt to
the rules of the language (see also MacDonald, Pearlmutter & Seidenberg, 1994).
This brings us back to the methodological recommendations raised earlier. In particular, we think it would be important to
design stimuli that vary consistently in the
-52-
BLACKWELL, ET AL.
The Time Course of Grammaticality Judgment
number and strength of the alternative interpretations that subjects have in mind at each
point across the course of the sentence. This
approach has been used in studies of sentence comprehension (see, for example, the
large literature on Òminimal attachmentÓ
and other strategies associated with the processing of sentence ambiguitiesÑTaraban
& McClelland, 1988; MacDonald et al.,
1994; Trueswell & Tanenhaus, 1994;
Trueswell, Tanenhaus & Garnsey, 1994;
MacDonald, 1994). It seems likely that this
approach will be equally useful in the study
of grammaticality judgment. The cloze
method used in our Experiment 1 may be
particularly useful in this regard, to make
sure that subjects perceive the same ambiguities that we have in mind in designing
our materials.
With this recommendation, we return to
the original motivation for on-line studies
of grammaticality judgment. For close to
Þfty years, grammaticality judgments by
trained native speakers have been the method of choice for linguists working within
the generative tradition. And yet we still
know very little about the cognitive processes that underlie such judgments, and
thus the factors that may affect them. Our
own work has focused on the judgments
produced by naive listeners, with sentence
materials that are in some sense ÒpretheoreticÓ (i.e., they were not designed to discriminate among current theories of
syntactic structure). However, we believe
that the on-line methods investigated here
could provide a useful adjunct to current
linguistic research, applied to a richer set of
linguistic materials as they are processed by
ÒexpertÓ listeners. Sentences may appear to
be more or less grammatical in orthodox
linguistic research not because of variations
in the number of rules violated (as proposed, for example, by Chomsky, 1965),
but rather because of variations in the number, frequency and nature of the possible
completions and partially overlapping alternatives that native speakers entertain
while each sentence is evaluated.
-53-
BLACKWELL, ET AL.
The Time Course of Grammaticality Judgment
References
¥
¥
¥
¥
¥
¥
¥
¥
¥
¥
¥
¥
Bates, E., & Wulfeck, B. (1989a). Crosslinguistic studies of aphasia. In B. MacWhinney
& E. Bates (Eds.), The crosslinguistic study
of sentence processing, pp. 328-371. Cambridge: Cambridge Univ. Press.
Bates, E., & Wulfeck, B. (1989b). Reply:
comparing approaches to comparative
aphasiology. Aphasiology, 3(2), 161-168.
Bates, E., Appelbaum, M., & Allard, L.
(1991). Statistical constraints on the use of
single cases in neuropsychological research.
Brain and Language, 40, 295-329.
Bates, E., Wulfeck, B., & MacWhinney, B.
(1991). Cross-linguistic research in aphasia:
An overview. Brain and Language, 41(2),
123-148.
BloomÞeld, L. (1961). Language: New York:
Holt, Rinehart and Winston.
Boland, J. E., Tanenhaus, M. K., Carlson, G.,
& Garnsey, S. E. (1989). Lexical projection
and the interaction of syntax and semantics in
parsing. Journal of Psycholinguistic Research, 18(6), 563-576.
Boland, J. E., Tanenhaus, M. K., & Garnsey,
S. M. (1990). Evidence for the immediate use
of verb control information in sentence processing. Journal of Memory and Language,
29, 413-432.
Brown, C., Hagoort, P., & Vonk, W. (1995).
On-line sentence processing: Parsing preferences revealed by brain responses. Eighth Annual CUNY Conference on Human Sentence
Processing, Tucson, AZ, March.
Caplan, D. (1981). On the cerebral organization of linguistic functions: Logical and empirical issues surrounding deÞcit analysis and
functional localization. Brain and Language,
14, 120-137.
Caramazza, A., & Berndt, R. (1985). A multicomponent deÞcit view of BrocaÕs aphasia.
In M. L. Kean (Ed.), Agrammatism. Orlando:
Academic.
Caramazza, A., & Zurif, E. B. (1976). Dissociation of algorithmic and heuristic processes
in language comprehension: Evidence from
aphasia. Brain and Language, 3, 572-582.
Carpenter, P. A., & Just, M. A. (1989). The
role of working memory in language comprehension. In D. Klahr & K. Kotovsky (Eds.),
Complex information processing: the impact
¥
¥
¥
¥
¥
¥
¥
¥
¥
¥
¥
¥
¥
¥
-54-
of Herbert A. Simon, pp. 31-6). Hillside, NJ:
Lawrence Erlbaum.
Chomsky, N. (1957). Syntactic structures. the
Hague: Mouton and Co.
Chomsky, N. (1965). Aspects of the theory of
syntax. Cambridge, MA: MIT Press.
Daneman, M., & Carpenter, P. A. (1980). Individual differences in working memory and
reading. Journal of Verbal Learning and Verbal Behavior, 19, 450-466.
Grier, J. B. (1971). Non-parametric indexes
for sensitivity and bias: Computing formulas.
Psychological Bulletin, 75(6), 424-429.
Haarmann, H., & Kolk, H. (1992). The production of grammatical morphology in BrocaÕs and WernickeÕs aphasics: Speed and
accuracy factors. Cortex, 28, 97-112.
Hagoort, P., Brown, C., & Groothusen, J.
(1993). The syntactic positive shift (SPS) as
an ERP measure of syntactic processing.
Language and Cognitive Processes, 8(4),
439-483.
Just, M., & Carpenter, P. (1980). A theory of
reading: From eye Þxations to comprehension. Psychological Review, 87, 329-353.
Just, M. A., & Carpenter, P. A. (1992). A capacity theory of comprehension: Individual
differences in working memory. Psychological Review, 99(1), 122-149.
King, J. & Kutas, K. (1995) Who did What
and WhenÉ Journal of Cognitive Neuroscience, 7(3), 376-395.
Kluender, R. (1992). Cognitive constraints
on variables in syntax. Unpublished doctoral
dissertation, UCSD.
Kolk, H. (March, 1985). Telegraphic speech
and ellipsis, Presented at the Royaumont
Conference Centre, Paris, France, March.
Kolk, H., & Heeschen, C. (1985). Agrammatism versus paragrammatism: A shift of behavioral control, Paper presented at the
Academy of Aphasia 23rd Annual Meeting,
Pittsburgh, PA.
Kutas, M., & Kluender, R. (1991). What is
who violating? A reconsideration of linguistic violations in light of event-related potentials. Center for Research in Language
Newsletter, 6(1).
Levelt, W. J. M. (1972). Some psychological
aspects of linguistic data. Linguistische Berichte, 17, 18-30.
BLACKWELL, ET AL.
¥
¥
¥
¥
¥
¥
¥
¥
¥
¥
¥
¥
The Time Course of Grammaticality Judgment
Levelt, W. J. M. (1974). Formal grammars in
linguistics and psycholinguistics. Vol. 3: Psycholinguistic applications. The Hague: Mouton.
Levelt, W. J. M. (1977). Grammaticality,
paraphrase, and imagery. In S. Greenbaum
(Ed.), Acceptability in language. The Hague:
Mouton.
Linebarger, M., Schwartz, M., & Saffran, E.
(1983). Sensitivity to grammatical structure
in so-called agrammatic aphasics. Cognition,
13, 361-392.
MacDonald, M. C. (1994). Probabilistic constraints and syntactic ambiguity resolution.
Language and Cognitive Processes, 9(2),
157-201.
MacDonald, M., Pearlmutter, N. J., &
Seidenberg, M. S. (1994). The lexical nature
of syntactic ambiguity resolution. Psychological Review., 101, 676-703.
Mauner, G. (1992). Syntactic control and the
interpretation of VP anaphors. Paper presented at the NELS 22.
Miceli, C., Silveri, M. C., Romani, C., & Caramazza, A. (1989). Variation in the pattern of
omissions and substitutions of grammatical
morphemes in the spontaneous speech of socalled agrammatic patients. Brain and Language, 36, 447-492.
Neville, H., Nicol, J. L., Barss, A., Forster, K.
I., & Garrett, M. F. (1991). Syntactically
based sentence processing classes: Evidence
from event-related brain potentials. Journal
of Cognitive Neuroscience, 3, 151-165.
Newmeyer, F. (1980). Linguistic theory in
America. New York: Academic Press.
Osterhout, L., & Holcomb, P. J. (1993).
Event-related potentials and syntactic anomaly: Evidence of anomaly detection during
the perception of continuous speech. Language & Cognitive Processes, 8(4), 413-437.
Pollack, I., & Norman, D. A. (1964). A nonparametric analysis of signal detection experiments. Psychonomic Science, 1, 125-126.
Rayner, K., Carlson, M., & Frazier, L. (1983).
¥
¥
¥
¥
¥
¥
¥
¥
¥
-55-
The interaction of syntax and semantics during sentence processing: Eye movements in
the analysis of semantically biased sentences.
Journal of Verbal Learning and Verbal Behavior, 22, 358-374.
Sells, P., Shieber, S., & Wasow, T. (1991).
Foundational issues in natural language processing. Cambridge: MIT Press.
Shankweiler, D., Crain, S., Gorrell, P., &
Tuller, B. (1989). Reception of language in
BrocaÕs aphasia. Language and Cognitive
Processes, 4(1), 1-33.
Taraban, R., & McClelland, J. L. (1988).
Constituent attachment and thematic role assignment in sentence processing: Inßuences
of content-based expectations. Journal of
Memory and Language, 27, 597-632.
Trueswell, J. C., & Tanenhaus, M. K. (1994).
Towards a lexicalist framework of constraintbased syntactic ambiguity resolution. In C.
Clifton, L. Frazier, & K. Rayner (Eds.), Perspectives on sentence processing, pp. 155179. Hillside, NJ: Lawrence Erlbaum.
Trueswell, J. C., Tanenhaus, M. K., & Garnsey, S. M. (1994). Semantic inßuences on
parsing: Use of thematic role information in
syntactic disambiguation. Journal of Memory
and Language, 33, 285-318.
Tyler, L. K. (1992). Spoken language comprehension: An experimental approach to
disordered and normal processing. Cambridge, MA: MIT Press.
Wulfeck, B., & Bates, E. (1991). Differential
sensitivity to errors of agreement and word
order in BrocaÕs aphasia. Journal of Cognitive Neuroscience, 3, 258-272.
Wulfeck, B., Bates, E., & Capasso, R. (1991).
A cross-linguistic study of grammaticality
judgments in BrocaÕs aphasia. Brain and
Language, 41(2), 311-336.
Wulfeck, B. B. (1987). Sensitivity to grammaticality in agrammatic aphasia: processing of word order and agreement violations.
Unpublished Doctoral dissertation, University of California, San Diego.
BLACKWELL, ET AL.
The Time Course of Grammaticality Judgment
Endnotes
1.
Although in some cases comparisons were also
made at sentence end and in some cases one
word before or after as well.
2.
Because the AÕ score is an index of accuracy designed to correct for response bias, on psychological grounds the AÕ score can only be
analyzed over subjects (i.e. treating subjects as
a random variable and items as a Þxed effect),
and not over items (i.e. treating items as a random variable and subjects as a Þxed effect).
Thus, all item analyses for accuracy are for percent correct to ungrammatical.
3.
Used because we were only comparing two
points in each case and because t-tests are less
conservative and provide more power than
Newman-Keuls.
4.
For late auxiliary error reaction times at the Ò>
20%Ó interval, just before the divergence point,
there was a small but signiÞcant difference be-
tween agreement errors (861 ms) and the other
two error types (omission = 777 ms, transposition = 755). For late determiner errors, judgments showed a slight yet signiÞcant increase in
omissions (4%, compared to 1% or less) at the
Ò>40%Ó interval, and reaction times showed for
omissions (819 ms) were signiÞcantly slower
(agreement = 749 ms, transposition = 721 ms);
these effects are most likely spurious, as sentences to this point have no structural differences and the variance at this intervals is quite low.
5.
Note that we are assuming in what follows that
the visual RSVP task is at least in some respects
comparable with auditory processing.
6.
For Experiments 1 and 3, the correlation for early errors was r = +0.70 (p £ 0.0001), and for late
errors r = +0.69 (p £ 0.0001). For Experiments
1 and 2, the correlation for early errors was r =
+0.80 (p £ 0.0001), and for late errors r = +0.76
(p £ 0.0001).
-56-
BLACKWELL, ET AL.
The Time Course of Grammaticality Judgment
Appendix I.a. The 12 types of ungrammatical sentence
part of
speech
location of error
type of
error
auxiliary
early
late
omission
Mrs. Brown working * quietly in
the church kitchen.
She is reading that mystery novel
that her mother writing. *
agreement
The writer were * holding a very
big party.
While sitting on the couch, Mr.
LaneÕs daughters was * watching
a movie.
transposi-
Miss Hope sending * was several
green dresses that Lisa had ordered.
While talking to Jane, Joseph
knitting * was a sweater.
omission
Girl * was working quietly near
the small, red house.
The small, thin green vine was
sprouting ßower. *
agreement
A boys * are driving a large van
that the artist has painted.
Larry is saying that his mother
was planting that bushes. *
transposi-
Helicopter * a was hovering
loudly over the army base.
The girls were watching the stars
while camping in desert * that.
tion
determiner
tion
Appendix I.b
Examples of error types in the ungrammatical core stimuli. The asterisk indicates the
location of the error. The number indicates number of words past the logical error point.
The order of error types is omissions, agreement errors, transposition errors in all of the following cells.
Early auxiliary errors
Mrs.
Brown
working *
quietly
in
the
church
kitchen.
-2
-1
0
1
2
3
4
5
The
writer
were *
holding
a
very
big
party.
-2
-1
0
1
2
3
4
5
Miss
Hope
sending *
was
several
green
dresses
that
Lisa
had
ordered.
-2
-1
0
1
2
3
4
5
6
7
8
-57-
BLACKWELL, ET AL.
The Time Course of Grammaticality Judgment
Early determiner errors
Girl *
was
working
quietly
near
the
small,
red
house.
0
1
2
3
4
5
6
7
8
A
boys *
are
driving
a
large
van
that
the
artist
has
painted.
-1
0
1
2
3
4
5
6
7
8
9
10
Helicopter *
a
was
hovering
loudly
over
the
army
base.
0
1
2
3
4
5
6
7
8
Late agreement errors
She
is
reading
that
mystery
novel
that
her
mother
written. *
-9
-8
-7
-6
-5
-4
-3
-2
-1
0
While
sitting
on
the
couch,
Mr.
LaneÕs
daughters
was *
watching
a
movie.
-8
-7
-6
-5
-4
-3
-2
-1
0
1
2
3
While
talking
to
Jane,
Joseph
knitting *
was
a
sweater.
-5
-4
-3
-2
-1
0
1
2
3
Late determiner errors
The
small,
thin
green
vine
was
sprouting
flower. *
-7
-6
-5
-4
-3
-2
-1
0
Larry
is
saying
that
his
mother
has
planted
that
bushes. *
-9
-8
-7
-6
-5
-4
-3
-2
-1
0
Those
girls
were
watching
the
bright
lightning
while
camping
in
desert *
that.
-10
-9
-8
-7
-6
-5
-4
-3
-2
-1
0
1
-58-
BLACKWELL, ET AL.
The Time Course of Grammaticality Judgment
Appendix I.c: Core stimuli, organized by cell
Underlined sentences are grammatical controls; bolded sentences are grammatical repeats.
1. Early Auxiliary Omission
1.1.
They examining * several expensive old paintings while walking through the art
museum.
1.2.
They were reading several large maps while waiting for the next train.
1.3.
My mother visiting * an expensive and famous plastic surgeon.
1.4.
The man was playing both old and modern piano pieces.
1.5.
Joan making * several big and tasty ice cream drinks.
1.6.
Julie was eating a large, creamy, chocolate and coconut pie.
1.7.
My cousin drawing * three small pictures of his motherÕs new cats.
1.8.
Her mother was reading some old articles on famous Hollywood movie actors.
1.9.
The boy taking * a black feather that the pigeon had dropped.
1.10.
The doctor is reading the medical report that the nurse has written.
1.11.
Mrs. Brown working * quietly in the church kitchen.
1.12.
A small boy was walking slowly down the beach.
1.13.
TomÕs mother forgetting * that he had taken his new car.
1.14.
Several people were saying that fishermen had killed those blue dolphins.
2. Late Auxiliary Omission
2.1.
While sitting on the red sofa, her older friend eating * some cake.
2.2.
While babysitting for their neighbors, Mrs. JohnsonÕs daughters were eating some
candy.
2.3.
Her older brotherÕs first guest drinking * a beer.
2.4.
My young cousinÕs very first dinner party guest was making some drinks.
2.5.
The two very famous Italian chefs making * a salad.
2.6.
The two famous New York chefs were making a cake.
2.7.
In the very big and shady front yard, BillÕs mother picking * flowers.
2.8.
Near the big, old summer house, several animals were drinking water.
-59-
BLACKWELL, ET AL.
The Time Course of Grammaticality Judgment
2.9.
She is reading that mystery novel that her mother written. *
2.10.
They are eating the candy bars that Mrs. Morton has brought.
2.11.
The young, new president of JohnÕs college speaking * briefly.
2.12.
That very old friend of my father's was walking slowly.
2.13.
JohnÕs boss is upset that his new secretary stolen * a typewriter.
2.14.
SamÕs friend is saying that his two sisters have made some cookies.
3. Early Determiner Omission
3.1.
Boy * was entering a contest while staying at the hotel.
3.2.
The girls were eating some fries while waiting for their friends.
3.3.
Girl * was eating some dark chocolate ice cream.
3.4.
The woman was having a very big dinner party.
3.5.
Clerk * was reading several very old and important letters.
3.6.
The woman was painting several very large, colorful pictures.
3.7.
Woman * was watching some orange butterflies in the small back garden.
3.8.
Her mother was reading some old articles on famous Hollywood movie actors.
3.9.
Woman * is visiting the old dairy farm that her father has bought.
3.10.
The clerk is sending several cotton shirts that DorothyÕs mother has ordered.
3.11.
Girl * was working quietly near the small, red house.
3.12.
The balloon was floating slowly through the air.
3.13.
Woman * was saying that her husband had bought several big tomatoes.
3.14.
The man was reading that many people had protested those new taxes.
4. Late Determiner Omission
4.1.
The boy was finding many big sea shells while playing on beach. *
4.2.
They were reading several large maps while waiting for the next train.
4.3.
My new blue and green silk ball gown was costing fortune. *
4.4.
The large and pale gray cruise ship was hitting an iceberg.
4.5.
The small, thin green vine was sprouting flower. *
4.6.
Her two favorite great aunts were making some pie.
-60-
BLACKWELL, ET AL.
The Time Course of Grammaticality Judgment
4.7.
Alice was calling her old college friend at hotel. *
4.8.
Martha was bringing several old dance records to the party.
4.9.
The maid whom Sally has hired is cleaning bathroom. *
4.10.
The woman whom AnneÕs father has hired is cleaning the windows.
4.11.
Two very famous art critics were speaking briefly at museum. *
4.12.
A plane was flying slowly over the old landing strip.
4.13.
The woman was writing that her two daughters had bought car. *
4.14.
The train conductor was saying that some trash had blocked the tracks.
5. Early Auxiliary Agreement
5.1.
The women was * drinking some wine while talking about the movie.
5.2.
The girls were eating some fries while waiting for their friends.
5.3.
The writer were * holding a very big party.
5.4.
The man was playing both old and modern piano pieces.
5.5.
The vine were * growing a few red and yellow flowers.
5.6.
Julie was eating a large, creamy, chocolate and coconut pie.
5.7.
The men was * reading those papers on the train.
5.8.
Martha was bringing several old dance records to the party.
5.9.
She were * seeing the place where her two older sisters had worked.
5.10.
They were visiting the house where NancyÕs parents and grandparents had lived.
5.11.
Soap bubbles was * floating slowly into the summer sky.
5.12.
Honey bees were flying loudly around a large, old oak tree.
5.13.
MikeÕs parents was * hoping that he had passed the final exam.
5.14.
Several people were saying that fishermen had killed those blue dolphins.
6. Late Auxiliary Agreement
6.1.
While sitting on the couch, Mr. LaneÕs daughters was * watching a movie.
6.2.
While babysitting for their neighbors, Mrs. JohnsonÕs daughters were eating
some candy.
6.3.
Some famous old Hollywood actor were * having a party.
6.4.
Several very young children were watching a play.
-61-
BLACKWELL, ET AL.
The Time Course of Grammaticality Judgment
6.5.
The old, red brick houses was * blocking the view.
6.6.
Her two favorite great aunts were making some pie.
6.7.
In Mrs. HartÕs small rose garden, the gardener were * planting bushes.
6.8.
Near the big, old summer house, several animals were drinking water.
6.9.
John is eating the pizza that his mother have * made.
6.10.
They are eating the candy bars that Mrs. Morton has brought.
6.11.
In the bankÕs very large lobby, the men was * talking quickly.
6.12.
In a big, old, red boat, two girls were rowing slowly.
6.13.
Susan is saying that she have * cleaned it.
6.14.
Chris is saying that his mother has bought a house.
7. Early Determiner Agreement
7.1.
Those girl * was visiting Jack while driving through the town.
7.2.
The boy was reading a comic book while standing on the corner.
7.3.
A women * were watching the Fourth of July fireworks.
7.4.
The woman was having a very big dinner party.
7.5.
Two woman * was selling several expensive imported gowns.
7.6.
Those models were wearing that new wave hairstyle.
7.7.
A boys * were feeding the small, brown bird in the yard.
7.8.
Those girls were petting the small, brown cat in the yard.
7.9.
A boys * are driving a large van that the artist has painted.
7.10.
The clerk is sending several cotton shirts that DorothyÕs mother has ordered.
7.11.
Those house * was selling quickly, for very little money.
7.12.
The balloon was floating slowly through the air.
7.13.
Several sailor * was saying that the man had predicted a storm.
7.14.
The man was reading that many people had protested those new taxes.
8. Late Determiner Agreement
8.1.
JimÕs sisters were watching the ocean waves while sitting on that rocks. *
8.2.
Mrs. Taylor was eating a turkey sandwich while talking on the phone.
-62-
BLACKWELL, ET AL.
The Time Course of Grammaticality Judgment
8.3.
The very famous rock singer was performing several song. *
8.4.
My young cousinÕs very first dinner party guest was making some drinks.
8.5.
Mr. HallÕs entire class was watching several cartoon. *
8.6.
The two famous New York chefs were making a cake.
8.7.
ArthurÕs daughters were driving that red sports car over those mountain. *
8.8.
Those girls were petting the small, brown cat in the yard.
8.9.
Several workers whom Mr. Stevens has hired are painting those fountain. *
8.10.
The woman whom AnneÕs father has hired is cleaning the windows.
8.11.
The young man was speaking loudly with two salesman. *
8.12.
A small boy was walking slowly down the beach.
8.13.
Larry is saying that his mother has planted that bushes. *
8.14.
Chris is saying that his mother has bought a house.
9. Early Auxiliary Transposition
9.1.
JaneÕs friends watching * were some fireworks while standing on the hill.
9.2.
Mrs. Taylor was eating a turkey sandwich while talking on the phone.
9.3.
Those girls seeing *were some old and famous silent movies.
9.4.
The artists were selling several small but expensive watercolor paintings.
9.5.
She signing * was her newest and biggest story collection.
9.6.
The woman was painting several very large, colorful pictures.
9.7.
Students writing * are several math problems on the blackboard.
9.8.
JaneÕs mother is renting a small apartment in New York.
9.9.
Miss Hope sending * was several green dresses that Lisa had ordered.
9.10.
JanÕs hairdresser was learning a new look that Jan had wanted.
9.11.
The boy walking * was quickly to the store.
9.12.
The balloon was floating slowly through the air.
9.13.
That woman saying * is that her two friends have stolen several things.
9.14.
SamÕs friend is saying that his two sisters have made some cookies.
-63-
BLACKWELL, ET AL.
The Time Course of Grammaticality Judgment
10. Late Auxiliary Transposition
10.1.
While talking to Jane, Joseph knitting * was a sweater.
10.2.
While babysitting for their neighbors, Mrs. JohnsonÕs daughters were eating
some candy.
10.3.
A small and harmless black dog chasing * was chickens.
10.4.
The large and pale gray cruise ship was hitting an iceberg.
10.5.
My old junior high school friend's favorite little cousin watching * was cartoons.
10.6.
My old army friend's beautiful, bright red sports car was burning oil.
10.7.
In music class, two students singing * were songs.
10.8.
Near the big, old summer house, several animals were drinking water.
10.9.
Horses are eating the sugar cubes that Martin brought * has.
10.10. They are eating the candy bars that Mrs. Morton has brought.
10.11. In a large, old, silver car, several boys driving * were recklessly.
10.12. In a big, old, red boat, two girls were rowing slowly.
10.13. Those pilots were saying that several clouds covered * had the sky.
10.14. The train conductor was saying that some trash had blocked the tracks.
11. Early Determiner Transposition
11.1.
Man * that was reading some books while staying at the hotel.
11.2.
The boy was reading a comic book while standing on the corner.
11.3.
Guest * the was eating a cheese and sausage pizza.
11.4.
The artists were selling several small but expensive watercolor paintings.
11.5.
Students * several were buying some cheap French cheese.
11.6.
Those models were wearing that new wave hairstyle.
11.7.
Women * three are opening a small shop in the city.
11.8.
JaneÕs mother is renting a small apartment in New York.
11.9.
President * the was reading the report that his advisor had written.
11.10. The doctor is reading the medical report that her nurse has written.
11.11. Helicopter * a was hovering loudly over the army base.
11.12. A plane was flying slowly over the old landing strip.
-64-
BLACKWELL, ET AL.
The Time Course of Grammaticality Judgment
11.13. Announcer * the is saying that a big accident has blocked one lane.
11.14. SamÕs friend is saying that his two sisters have made some cookies.
12. Late Determiner Transposition
12.1.
Those girls were watching the bright lightning while camping in desert * that.
12.2.
Mrs. Taylor was eating a turkey sandwich while talking on the phone.
12.3.
The art museumÕs owner was buying paintings * several.
12.4.
Several very young children were watching a play.
12.5.
GeorgeÕs two remaining dinner guests were drinking wine * some.
12.6.
Her two favorite great aunts were making some pie.
12.7.
The magazine reporter was donating one hundred dollars to hospitals * those.
12.8.
The police officer was giving a speeding ticket to that guy.
12.9.
The man whom JackÕs sister has dated is cleaning car * the.
12.10. The woman whom AnneÕs father has hired is cleaning the windows.
12.11. Some drunk men were dancing wildly in streets * the.
12.12. A small boy was walking slowly down the beach.
12.13. Jerry is hoping that his friends have visited doctor * a.
12.14. Chris is saying that his mother has bought a house.
-65-
BLACKWELL, ET AL.
The Time Course of Grammaticality Judgment
Appendix I.d: Fillers
Grammatical Þllers
1.
2.
3.
4.
5.
6.
7.
8.
9.
10.
11.
12.
13.
14.
15.
16.
17.
18.
19.
20.
21.
22.
She instructed her secretary to hold all calls.
A jeep, the local beach guard noticed, was driving down to the water.
Those teachers were reading.
Driving down the road, he passed a huge grove of pecan trees.
Sherry was eating a pie.
Steve said that he was promoted quickly because he had worked so hard.
Sally believed that she had a detailed knowledge of car engines.
The most recent of the conferences differed from those others on several important points.
Mr Harrison, the Þrst successful publisher and editor of the Times, would seem to
be one of these entrepreneurs.
The weather a week ago Saturday, rain, and lots more rain to come, was depressing.
They have talked to John.
The man displayed a fuzzy toy that delighted the young child.
Once again, Rob planned his vacation late.
They were watching some movies.
While the economy appears sluggish, certain parts are improving.
What began worrying people in town was the opening of a third huge and sprawling shopping mall.
We saw, while visiting the dairy farm, a Holstein cow.
By the time Mrs. London was through, the restaurant had become one of the most
popular spots in town.
Don spoke to her and laughed.
She was trying to Þx up the car.
Jim's cousin was on Jack's mind.
Joy noticed several blue dolphins were playing in the water.
Ungrammatical Þllers
1.
2.
3.
Ellen read, while traveling on the train, several large and complicated company
technical reporters.
Sam appeared to be thinking hardly.
I have remembering that particular watercolor painting because of its sharp and
vivid blues and greens.
-66-
BLACKWELL, ET AL.
4.
5.
6.
7.
8.
9.
10.
11.
12.
13.
14.
15.
16.
17.
18.
19.
20.
21.
22.
The Time Course of Grammaticality Judgment
Holds in the ship is so big that you could store a house.
Jack was Þxing car a.
Last weeks, Mary and her two brothers saw a bald eagle ßying over the Foothills
Fashion Mall.
Jane have walked.
Will talked has to her.
Walk to that houses.
A horse were running.
Mrs Jones was claiming that by the age of two her daughter Carol walking and
talking was in full sentences.
Three cats drinking.
One in my friends is often working quite late.
Those Þlm director was protesting the destruction of the Amazon rain forest, as
are many well-known artists and writers.
One of Jane's dogs are often playing in the yard.
John seemed to be thinking as he walked aloud.
Other administration ofÞcials calls the Green Berets hostages.
She went at that direction, passing one car as she walked.
Several in the books, said the librarian, were unsuitable for young children.
Her report was so well written that she receiving a promotion.
As a resulting of her ßight delay, Sam's mother was staying in New York an extra
night.
Three thousand dollars were the minimum bid set by the art gallery.
-67-
BLACKWELL, ET AL.
The Time Course of Grammaticality Judgment
Appendix II
Sentences were drawn at random from a pool of seven different sentence types. Each
cell of the design received one of each of the following sentence types. A sentence demonstrating each sentence type is also given.
1.
2.
3.
4.
5.
6.
7.
While clause: ÒJohn was eating some cake while talking to Mary.Ó
SVO with heavy object: ÒHer husband was picking a few small, white and yellow
daisies.Ó
SVO with heavy subject: ÒMy little six-year-old cousin was watching cartoons.Ó
SVO with prepositional phrase: ÒMy friend was reading the paper on the express
bus.Ó
Relative clause: ÒMeg was reading the book that her mother had written.Ó
SV-prepositional phrase with adverb: ÒA balloon was ßoating slowly to the
ground.Ó
Subordinate clause: ÒJack was saying that the teacher has graded the tests.Ó
-68-