(Rene Kager, Joe Pater, Wim Zonneveld) Constraints
(Rene Kager, Joe Pater, Wim Zonneveld) Constraints
(Rene Kager, Joe Pater, Wim Zonneveld) Constraints
Edited by
René Kager, Joe Pater, and Wim Zonneveld
cambridge university press
Cambridge, New York, Melbourne, Madrid, Cape Town, Singapore, São Paulo
Cambridge University Press has no responsibility for the persistence or accuracy of urls
for external or third-party internet websites referred to in this publication, and does not
guarantee that any content on such websites is, or will remain, accurate or appropriate.
Contents
v
vi Contents
vii
Abbreviations
viii
Preface
ix
1 Introduction: constraints in phonological
acquisition
1
2 René Kager, Joe Pater, and Wim Zonneveld
In the first section of this introduction, we show how many issues and ideas
that appear in this volume have important precedents in prior research. We pro-
vide a brief survey of these issues, as they have been discussed in the research
tradition that links phonological theory and phonological acquisition, of which
the Optimality theoretic approach is the most recent outgrowth.1 In the sec-
ond section, we provide a tutorial on the fundamentals of Optimality Theory,
specifically tuned to its application to acquisition. And in the final section we
give an overview of some central issues in acquisition and learnability in Op-
timality Theory, drawing connections between these issues and the contents of
the chapters of this volume. This last section includes summaries of all of the
chapters except for Lise Menn’s contribution, which itself is a summary and
discussion of research on phonological acquisition and its relation to Optimality
Theory, from a perspective which seems to complement the one taken in this
introductory chapter.
choice it makes between the two consonants appearing in the adult target cluster
is not random, though it does vary from child to child. These mappings were
brought within the purview of phonological analysis with the emergence of the
much more formally oriented generative phonology.
The ‘young child’ appeared for the first time in the generative literature in
Chomsky (1959). This is the passage where (s)he appears:
(3) The child who learns a language has in some sense constructed [a]
grammar for himself on the basis of his observation of sentences and
nonsentences (i.e., corrections by the verbal community). Study of
the actual observed ability of a speaker to distinguish sentences from
nonsentences, detect ambiguities, etc., apparently forces us to the con-
clusion that this grammar is of an extremely complex and abstract
character, and that the young child has succeeded in carrying out what
from the formal point of view, at least, seems to be a remarkable type
of theory construction. Furthermore, this task is accomplished in an as-
tonishingly short time, to a large extent independently of intelligence,
and in a comparable way by all children. [. . .]
The fact that all normal children acquire essentially comparable
grammars of great complexity with remarkable rapidity suggests that
human beings are somehow specially designed to do this, with data-
handling or ‘hypothesis-formulating’ ability of unknown character and
complexity.
These remarks imply a meaningful initial state or faculté de langage or Universal
Grammar (UG), of Chomsky (1965: 5–6, 1968: 27), the study of which formu-
lates the ‘conditions that a system must meet to qualify as a potential human
language, conditions that [. . .] constitute the innate organization that determines
what counts as linguistic experience and what knowledge of language arises
on the basis of that experience’. Universal Grammar is the innate starting point
of language acquisition for each human being, and it is a Jakobsonian concept
in the sense that its elements are hypothesised also to appear as ‘universals’ in
typological language studies.
The child tries to find its way through the data maze assisted by UG, and
constructs an abstract system called a grammar. This grammar is an amalga-
mation of persisting elements from UG (in the way envisaged by Jakobson)
and acquired, language-specific components. It is the linguist’s task to make
sense of the structure of these grammars, and the role UG plays in their growth.
The contributions discussed in this section in some way or other all take up
the challenge posed by Jakobson and Chomsky. Although their topics are
often intertwined, we chronologically subdivide them as follows: first Smith
(1973) and Stampe (1969, 1973a, 1973b); then Braine (1976), Macken (1980),
Kiparsky and Menn (1977), and Menn (1978, 1980), and finally the literature
4 René Kager, Joe Pater, and Wim Zonneveld
dealing with parameters starting with Chomsky (1981a, 1981b). These works
raise issues of fundamental importance, arguing directly from observed proper-
ties of the acquisition process. Roughly, these issues can be subdivided into
‘formal’ ones, and ones of ‘substance’. Among the former are formal as-
pects of the rule-based approach towards the phonological component; and
the abstractness of the underlying forms of child speech. Among the latter
are: markedness, typology and language acquisition; rules vs. processes; con-
spiracies and output constraints; parameters; and the perception-production
dichotomy.
It is the contention of Chomsky (1965: 16, 28) that the studies of syn-
tax, semantics, and phonology can all proceed along the lines of reasoning
just described. Chomsky and Halle (1968) is the first sizeable illustration of
this methodology for Generative Phonology. No secret is revealed by the as-
sessment that this work had a very strong ‘formal’ bias, and that the link be-
tween form and substance was underdeveloped. At the heart of this approach
were four formal devices: (i) feature representations using a UG-based set of
phonological features drawn in from the pre-generative era (see Halle 1983);
(ii) derivations mapping lexical representations onto surface ones by rewrite
rules, formulated in a UG-based format and mutually ordered along lines also
specified in UG (‘linear ordering’, with a number of further specifications);
(iii) morpheme structure rules stating redundancies at the level of the lexicon;
and (iv) an evaluation measure also located in UG, evaluating a grammar’s
formal complexity as a function of the number of symbols in it. This measure
selected less complex grammars over more complex ones for any given body
of data. Although it was intended both to contribute to an explanation of the
behaviour of native speakers, including acquisitional behaviour, in actual prac-
tice it was only infrequently called upon given the complexities of the linguist’s
task to formulate an analysis of the empirical data under investigation at all,
given the other three formal devices and their UG aspects.
A seminal acquisitional study in this framework is Smith (1973), which
also presents a wealth of original data (assembled in the form of a detailed
longitudinal study of the progress of the author’s son Amahl between the ages
of [2;3] and [3;11]). His monograph has two aims. First, to use child data to
argue for the correctness of the view of the grammar as a system of ordered
rules deriving an output form from an underlying form. Second, to endorse the
idea that child grammars just as much as adult grammars are constrained by the
universals of UG. In conformity with the priorities of the times, the principal
universal discussed in Smith’s study is that of rule ordering; just as rules are
(linearly) ordered in an adult grammar, rules can be crucially ordered at certain
acquisitional stages, they can be modified, be reordered, or disappear at later
stages, until the adult grammar is reached, which contains the final stage of
ordered rules. Consider Smith (1973: 158):
Introduction 5
(4) Chomsky [1967] suggested (p. 105) that “the rules of the grammar
must be partially ordered,” going on to claim that the principle of
rule ordering was an a priori part of the basis which made language
acquisition possible (Chomsky 1967, pp. 127–128). To the extent that
one can establish the psychological validity of the realisation rules
[of Amahl’s grammar] and to the extent that the ordering relations
established among these rules are necessary, so is Chomsky’s claim
substantiated.
The crucial pair in the data is that of puddle and puzzle, whose pronunciations
require a counterfeeding rule ordering (Kiparsky shows that such orderings are
avoided in grammars, and subject to historical change; see also Kiparsky and
Menn 1977);4 schematically:
(6) by (5a) puddle → pu[]le
by (5b) puzzle → pu[d]le (→ *pu[]le)
6 René Kager, Joe Pater, and Wim Zonneveld
This chain shift also shows that the move away from a coronal in the former
case cannot be due to some production inability; the striking characteristic of
this case will be returned to below.
In a phonological system in which alternations based on rich morphology are
(still) virtually lacking, as in early child systems, the nature of the underlying
forms of the system is a potentially controversial issue. Rejecting the perhaps
initially plausible view that the child’s phonology acts as a self-contained sys-
tem with a low degree of abstractness in which the underlying forms are very
close to the output forms, Smith argues that the child’s underlying forms are
generally equivalent to the adult surface forms that the child takes in as his
day-to-day input. This is not in his case just an a priori assumption or just a
first approach to the problem: it is based on a range of evidence showing that
the child actually operates in this manner (Smith 1973: 11). The conversions in
(5a) and (5b) are not just the author’s expository manipulations, but represent
actual hypothesised characteristics of the child’s grammar. Let us briefly review
some of this evidence for such analyses (Smith 1973: 133–148).
First, Amahl must have stored the adult forms because he was able to recog-
nise and discriminate items of the adult language which he himself could not
or did not produce or discriminate in his production. Thus, he could point cor-
rectly to pictures of a mouse and a mouth ‘before he was able to speak at all’ and
when he was ‘still unable to produce the contrast between [s] and [θ ]’ (p. 134):
both are [maut] and then [maus] at exactly the same stages. Second, in a case
of so-called ‘phonemic overlap’, the possibility for some of his l-initial output
words to alternate with [r-], and for other l-initial words not to do so, was en-
tirely based on the adult distinction between words beginning with r- and l-,
respectively (so [rait, lait] for right, and [lait] for light), implying an optional
rule operating on adult-like underlying representations. Third, the develop-
ment of certain items over time point to the same property of the grammar:
‘before consonant+/l/ clusters appeared at all, there was neutralisation of such
adult examples as bed and bread as [bε d]. Once clusters appeared these were
differentiated as [bε d] and [bl ε d], respectively, and likewise for many com-
parable items’ (p. 139). Fourth, Amahl, as soon as he learned a new sound
or sound combination, immediately utilised it correctly ‘across the board’ in
all the relevant words, rather than incorporating it separately and slowly into
each word. This indicates ‘that these sounds and sound sequences must have
been stored in the brain “correctly” in order for their appearance to be so
consistently right. [Thus,] once [l] appeared for /sl/ it appeared in all words
containing /sl/ nearly at the same time’ (p. 139). Finally, evidence for adult-
like underlying forms comes from early alternations, such as those involving
plural formation (p. 148). Consider the data below, resembling an alternation
such as that produced by final devoicing in a language such as Dutch and
German.
Introduction 7
rules, marking conventions can have a ‘linking’ function, making some rules
more costly than others, hence dispreferred in grammars as well as presumably
in the acquisition process. As an example, consider the rounding that often
accompanies the backing of front non-round vowels: given the evaluation mea-
sure, rounding threatens to be more costly than leaving it out. If markedness
convention (9b) links up to the output of a backing rule, it will supply the
roundness feature, whereas blocking of the convention must be achieved by
specifying the marked value in the structural change of the rule.
Smith (1973: 199–201) makes an attempt at establishing the usefulness of
the marking conventions for his results. In spite of the promising beginning of
this attempt quoted in (8a), his conclusion is exactly the opposite: ‘it seems that
with the exception of one or two isolated examples of the type cited, marking
conventions are completely irrelevant’, adding that ‘[i]n general it would seem
that the present state of ignorance makes it impossible to effect any interest-
ing correlation between acquisitional phenomena and marking conventions’.
Anderson (1985: 334–342) makes a more fundamental point. In spite of the
expressed aim of being less ‘overly formal’, Chomsky and Halle’s markedness
theory ‘is in fact an attempt at exhaustively reducing the considerations of pho-
netic content that might be relevant to phonology to purely formal expression in
the notation (now enhanced by its interpretation through the marking conven-
tions). It is thus entirely consistent with the original SPE program of reducing
all of the theory of phonological structure to a single explicit formal system in-
cluding a notation and a calculus for manipulating and interpreting expressions
within that notation. [. . .] The revision involved, however, was in a more com-
plete working out of the goal of reducing phonology to a formal system rather
than a replacement of that goal with some other’ (pp. 333–334). He adds that
‘essentially no substantial analyses of phonological phenomena have appeared
subsequently in which this aspect of the theory plays a significant role. One
MIT dissertation (Kean 1975) was devoted to further elaboration of the theory,
but this remained (like chapter 9 of SPE [The Sound Pattern of English]) at the
level of a programmatic statement rather than constituting an extended analysis
of the phonology of some language(s) in the terms prescribed by the theory.’
Detailed criticism of Chomsky and Halle’s markedness theory was provided
by Stampe (1973a, 1973b). At first glance his distinction between (language-
specific) rules and (universal) processes closely resembles the SPE one be-
tween rules and markedness conventions, but a closer look reveals fundamental
differences. Rules ‘merely govern alternations’ and are more often than not
‘phonetically unmotivated’, his example being the English rule Velar Softening
(Chomsky and Halle 1968) relating ele[k]tric and electri[s]ity: it has excep-
tions in the language, and the change from /k/ to [s] is a far from natural one
in phonetic terms. Processes, on the other hand, ‘reflect genuine limitations on
Introduction 9
what we can pronounce’. They are part of the common acquisitional starting
point, i.e., of UG (a term not used in his work):
(10) [I]n its language-innocent state, the innate phonological system ex-
presses the full system of restrictions of speech: a full set of phono-
logical processes, unlimited and unordered. [. . .] A phonological pro-
cess merges a potential phonological opposition into that member of
the opposition which least tries the restrictions of the human speech
capacity. [. . .] Each new opposition the child learns to pronounce
involves some revision of the innate phonological system. [. . .] The
child’s task in acquiring adult pronunciation is to revise all aspects of
the system which separates his pronunciation from the standard. If he
succeeds fully, the resultant system must be equivalent to that of the
standard speakers.
In the view I’m proposing, then, the mature system retains all those
aspects of the innate system which the mastery of pronunciation has
left intact. (Stampe 1969: 443–447)
Language acquisition does not mean acquiring just rules (and ordering them),
as in Smith’s case in the Chomsky and Halle framework, but in addition to
acquiring the phonetically implausible rules such as Velar Softening, the pro-
cesses supplied by UG are manipulated: they can be suppressed, limited, and
ordered, when they are in conflict (conflicts arise, e.g., between absolute and
contextually restricted processes: obstruents are voiceless irrespective of con-
text because their oral constriction impedes the airflow required by voicing,
but they also prefer being voiced in a voiced environment). Stampe recognises
that to a certain extent his processes resemble Jakobsonian implicational laws,
but the ground covered by the latter is just a subset of that covered by the
‘innate processes’, namely that of ‘the phonemic inventory [. . .] unaffected by
contextual neutralisations’ (1969: 446). The laws are static, whereas processes
are active, and make predictions about representations as well as inventories;
moreover, they may be contextually conditioned. His assessment of Chomsky
and Halle’s markedness theory is that it ‘drastically underestimates the number
of processes which are innate’ and makes ‘totally unsupportable claims about
the nature of phonology’ (1973b: 45–46).
One or two brief examples may clarify how these ideas work in practice,
focusing on language acquisition. Consider Stampe’s comparison of the be-
haviour of coronals in Danish and Tamil (1973a: 14–16). In Danish, posttonic
/t/ is pronounced as [d], which sounds just as pretonic [d]; in addition, posttonic
/d/ is pronounced as [ð ]. This is a counterfeeding situation similar to the one
from Smith (1973) described in (8): a process of spirantisation precedes one
of voicing. In Tamil, both processes have ‘unrestricted’ applications: /t/ is both
10 René Kager, Joe Pater, and Wim Zonneveld
His illustrations from language acquisition involve examples of ‘one child hav-
ing ordered two processes which another child has not’ and, even, ‘examples
of a child actually performing the ordering’ (Stampe 1969: 447, 1973a: 11–12,
16–17). The structure of two such cases is the following:
(12) (a) Joan Velten (Velten 1943) pronounces lamb as [zab].
This pronunciation results from three distinct processes:
(i) delateralization: /l/ → [ j] (this is common in children, cf.
lie → [jai] by Hildegard in Leopold (1947))
(ii) spirantization: /j/ → [ ] (again see Hildegard’s you → [ u])
(iii) depalatalization: / / → [z] (as in Joan’s own wash → [was])
But for Hildegard /l/ remains [ j], it does not become spirantized
as in Joan’s speech. The difference between their pronunciations
lies not in the processes but in their order of application. Hildegard
does not apply spirantization to the ouput of delateralization; Joan
does. Thus whereas Joan cannot pronounce [j] at all, Hildegard
can pronounce it if she attempts to say /l/. This apparent oddity is
no stranger than the Dane’s inability to say postvocalic [d] except
by aiming at [t]. (Stampe 1973a: 16–17)
Introduction 11
Stampe’s ideas enjoyed considerable popularity in the 1970s; they were seen by
many as a promising variant of the then strong ‘Natural Generative Phonology’
(NGP) movement,6 and enjoyed considerable popularity among many acquisi-
tionists (see, e.g., Edwards and Shriberg 1983). As indicated (in our (8b)), Smith
(1973) alludes to the possibility of reducing (many of) his rules to Stampe’s
processes.
The wiseness of such a move, however, was seriously doubted by detractors
such as Kiparsky and Menn (1977). They argued that a careful look at first
language acquisition and its developmental properties and stages does simply
not support the main tenets of theories such as those by Jakobson and Stampe,
which they lump together as ‘rather deterministic’: ‘In these theories, there is no
“discovery”, no experimentation, no devising and testing of hypotheses. [T]he
child’s speech development cannot simply be viewed as a monotonic approxi-
mation to the adult model.’ Much more typical of the acquisition process is that
it proceeds as a ‘problem solving’ process: ‘learning to talk [. . .] is a difficult
task’ in which ‘the child must discover ways to circumvent the difficulties’.
They see recognisable speech as a target which different children will attempt
to reach by a variety of means, including the ‘rules’ typical of child phonology
(Kiparsky and Menn 1977: 56–58):
(13) [T]here are several ways of dealing with consonant clusters: deletion
of all but one of the phonemes (in a stop-X or X-stop cluster, the one
preserved is not always the stop), conflation of some of the features of
the elements of the cluster (eg., sm > m, fl > w), insertion of a vowel
to break up the cluster, metathesis (snow > nos), etc. [. . .]
Different children exclude definable classes of output by different
means. When we observe such repeated ‘exclusion’, we conclude that
these classes of outputs (clusters, certain co-occurrences, the ‘third
position’, etc.) represent difficulties to the child, and the various rules
of child phonology (substitutions, deletions, etc.) as well as selective
avoidance of some adult words, are devices the child finds for dealing
with those difficulties. [. . .]
The very diversity and the ‘ingenuity’ of these devices might indi-
cate that early phonology should be regarded as the result of the child’s
active ‘problem solving’.
12 René Kager, Joe Pater, and Wim Zonneveld
further issues. First, that of the nature of the underlying forms of a child gram-
mar, also in relation to the perception/production dichotomy; and second, the
notion of parameter as a (partial) replacement of rules. With regard to the for-
mer issue, let us reconsider Smith’s contention that the underlying forms of
the child’s phonological component are constituted, by and large, by the adult
surface forms. This view did not go unchallenged for very long. In a review of
Smith’s monograph, Braine (1976) suspects that the influence of perception on
the underlying form must be larger than assumed by Smith. He argues in favour
of what he calls a ‘partial perception hypothesis’, which states that (1976: 492)
‘the child’s perception of words contains systematic biases, and is therefore
only partly accurate’. His support for this contention contains the following
elements, among others: there ‘is now much evidence, from discrimination
testing, that children’s ability to discriminate perceptually among phonemic
contrasts is far from perfect . . .’. Targeting Smith’s puzzle/puddle chain shift,
he focuses on the fact that the shift away from a coronal stop does not seem
to be motivated by a production difficulty because that same coronal stop is in
the output of the shift away from the fricative: ‘If we drop the assumption that
perception inevitably recovers the adult phonemes, [these] phenomena appear
in a new light. In puddle → pu[]le, one wonders if A’s auditory system could
be ranking the flapped or glottalized intervocalics as more similar to normal
adult [velars] than normal [coronals]’ (1976: 494). Braine’s reinterpretation of
Amahl’s phonological behaviour implies a model with three separate compo-
nents. First, ‘[a]uditory encoding laws would state how the child’s auditory
system transcribes the acoustic input into auditory attributes’. Second, on the
output side there would be realisation rules similar to Smith’s. Third, the model
contains ‘correspondence rules that map auditory features into the articulatory
features that the child controls (or partially controls). These would specify the
articulatory analysis made of words in the lexical store at any point in develop-
ment, and in effect associate motor commands with the auditory features.’ He
admits that ‘[u]nfortunately, the correspondence layer is unlikely to be well-
defined, given the “rare mistakes of articulatory coding” exhibited by Smith’s
son Amahl’ (and other children, too, presumably).
A multi-step procedure of a different kind but in the same perception/
production area of research was developed in work culminating in Menn (1978).
Her proposal has become known as the ‘two-lexicon model’. A child enters a
perceived form in an input lexicon. These forms undergo reduction processes
expressing the child’s limited abilities; the output of these processes is stored
in an output lexicon, which is completely redundancy-free. For production, a
(hopefully small) set of ‘production rules’ or ‘subroutines’ convert this repre-
sentation into the one entering the motor component, containing the articulatory
instructions. In the case of fish, for instance, the perceived form (usually close
to the adult surface form) is entered in the input lexicon. Reduction rules of the
14 René Kager, Joe Pater, and Wim Zonneveld
child phonology turn this form into one underlying production: redundancy-
free [ , s] is stored in the output lexicon as an unordered pair of vowel and
fricative, which must be ordered by the ‘production rules’; in this case this
ordering process is governed by a prohibition (output constraint) against frica-
tives in onsets. The principal empirical observation motivating the model is
that of the occasional inertia of the rule-learning process. If a new phonological
rule enters the child phonology, existing pronunciations sometimes persist, as
if output forms serve as independent lexical items (Menn and Matthei 1992:
213):
(16) An example from Daniel (Menn 1971) makes this clear: fairly early
on, Daniel produced ‘down’ and ‘stone’, both very frequent words, as
[dæwn] and [don], respectively. Sometime later, he began to show a
nasal harmony rule, producing ‘beans’ as [minz] and ‘dance’ as [næns].
For a while after this point, ‘down’ and ‘stone’ were maintained in
their nonassimilated forms; then the forms [næwn] and [non] began
to appear, in free variation with them. Finally, [næwn] and [non] were
triumphant – [dæwn] and [don] disappeared.
It has become clear, however, that this two-lexicon design cannot be maintained
in this relatively naive form. Empirical evidence has been put forward that
not the purported output lexicon but simply the ‘classical’ input lexicon is
active when the child starts to develop morphophonemic alternations. Smith’s
data in (6) constitute a case in point, and a similar example, originating from
Stemberger (1993), is represented in Bernhardt and Stemberger (1998: 48–49)
in the following manner:
(17) At 2;10, Gwendolyn generally formed the past tense of vowel-final
words by adding [d], as in adult English. There were two exceptions
to this, however. First, forms that are irregular in adult speech, such as
threw, could be produced correctly as irregulars or could be regularized
(throwed), as is common in child language. Second, words that are
consonant-final in adult speech but were vowel-final in the child’s
speech, such as kiss /k s/ [th i ], did not have -d added in the past
tense [. . .].
The two-lexicon approach seems to predict that all words that are
vowel-final in the child’s speech should be treated the same for the
creation of past tense forms. Since a final /d/ is added to words like
pee, it should also be added to words like kiss [ti ]. This prediction
fails, suggesting that the two-lexicon approach is inadequate.
Another argument points out that the model is hard put to account for between-
word phonological processes in child language, especially if the same gener-
alisations cover both words and simple syntactic constructions in early child
Introduction 15
speech: a lexical approach seems ill-equipped to capture such cases (Menn and
Matthei 1992: 223). It seems the explanation of the selective propagation of new
child speech rules must lie elsewhere. In the latter paper, Menn and Matthei
turn to ‘connectionist’ models in order to maintain ‘what’s good’ about their
approach. Menn (this volume) explains her most recent position.
Finally, in theoretical linguistics of the 1980s attempts were made to replace
the view of a grammar component as a rule system by one involving a set of
parameters, i.e., a set of choices specified by UG and fixed on a language-
particular basis given linguistic experience in that language. Ideally in this
approach the grammar becomes a collection of fixed parameter-settings. It
was recognised from the very outset that not only does this view provide a
framework for the study of language typology but it also has implications for
the study of language learning (cf. Chomsky 1981a: 8–9, 1981b: 3–4, 1986:
52, 145–6):
(19) [I]f we take all the languages of the world, how many syllable types
must we allow for? The answer is surprising. Three parameters suffice
to define every extant syllable type. [. . .] They are:
(10) (1) Does the rime branch? [no/yes]
(2) Does the nucleus branch? [no/yes]
(3) Does the onset branch? [no/yes] [. . .]
(13) (a) 000 No branching rimes, nuclei or onsets: Desano
(b) 100 Branching rimes, but no branching nuclei
or onsets: Quechua
(c) 10[1] Branching rimes and nuclei, no branching
onsets: Arabic
(d) 101 Branching rimes and onsets, no branching
nuclei: Spanish
(e) 111 Branching rimes, nuclei and onsets: English
The task of the language learning child is to fix the parameter settings. It is
usually assumed that parameters are entered in UG with a ‘default setting’.
Parameter (10.1), for instance, covers the difference between open and closed
syllables, but languages actually come in two types: those with open syllables
and those with open and closed ones. The ‘poverty of the stimulus’ argument
applied to this case results in the default setting of no for this parameter:
the presence of closed syllables can then be learned on the basis of positive
evidence. Among the many findings of Fikkert (1994), an elaborate study of
the acquisition of Dutch prosody in this framework, is that in this language,
which has branching rhymes, young children first omit word-final consonants
([pu ] for poes ‘pussycat’, [ka ] for klaar ‘ready’), and start producing them
some two months later. This can be seen as the setting of this parameter away
from the default value.
Similar parameters have been proposed for the typology of word stress sys-
tems (Hayes 1980/81, Prince 1983) and the acquisition of word stress has been
studied in considerable detail in Dresher and Kaye (1990), Fikkert (1994),
and Dresher (1999), both from the acquisitional-empirical and the principled-
theoretical angle. In spite of studies such as these, it seems that the parametric
approach to acquisition has not (yet) grown to its full potential, and it seems as
if parameters are sometimes seen as a poor man’s principles: fixed parameter
settings are intended to represent alternatives to rules (such as stress rules in
the Chomsky and Halle mould) but never fully replace them (see Piggott 1988),
and from the UG point of view parameters could be seen as ‘failed principles’,
cf. Archangeli (1997: 26):
▼
▼ ▼
candidate output 2
...
▼
candidate output n
▼
harmonic’ in all its pairwise competitions with other candidates; in each pair-
wise competition, the more harmonic candidate is the one that performs better
on the highest-ranking constraint that distinguishes between them (McCarthy
2001: 3). The optimal candidate b beats its competitor a as it performs bet-
ter on the highest-ranking constraint distinguishing between them, top-ranked
C1 . The winner also outperforms candidate c as it has fewer violations of the
highest-ranking constraint distinguishing between C2 .
(22) Simple constraint interactions
Constraint 1 Constraint 2 Constraint 3
Candidate a *!
Candidate b * *
Candidate c **!
This tableau shows that the optimal candidate b is not the one having no or
the smallest number of violation marks across columns. According to such
a criterion, candidate a would have been the winner. Instead what matters is
seriousness of violations, relativised to constraint ranking: Competitor a is
eliminated due to its single violation of a top-ranked constraint C1 . Also, it is
not the number of constraints violated by a candidate which matters, but rather
the distribution of marks over cells: Candidate c loses because of its double
violation of a single constraint C2 , even though it has no violations of C3 .
Many researchers assume that all constraints in grammars of natural lan-
guages are part of UG’s universal inventory of constraints called Con (Prince
and Smolensky 1993). According to this view, grammars differ exclusively in
the ranking of constraints. The central assumption that typological variation
is due to differences in ranking between constraints in a universal inventory
has consequences for language acquisition, as it restricts the learner’s search
space, while establishing a direct relation between phonological typology and
acquisition.
The alternative view on the status of constraints is that these emerge from ar-
ticulatory and perceptual factors which are active during acquisition (Boersma
1998, Hayes 1999). This functional approach also predicts a strong relation be-
tween typology and acquisition, because universal functional factors govern the
process of selection of constraints by the learner. This implies that constraints,
the ingredients of typology, should not differ between languages in arbitrary
ways. (See section 3.2 for discussion.)
On the standard view (originating with Prince and Smolensky 1993) con-
straints in Con fall into two broad classes, known as markedness constraints
and faithfulness constraints. Interactions of these constraint types model the
extent to which marked structures of certain kinds are allowed in a language.
20 René Kager, Joe Pater, and Wim Zonneveld
the unmarked structure. This can be illustrated for nasality in vowels, a marked
property on typological grounds: all languages have oral vowels, whereas not
all languages have nasal ones. The markedness constraint militating against
nasal vowels, *VNas ‘no nasal vowels’ competes with a faithfulness constraint
Ident-IO(nas), requiring all surface vowels to preserve the specification of
[nasal] of their input correspondents. Simple permutation of these constraints
produces two grammars, one suppressing nasality where it is specified in the
input, and another grammar which allows nasality of input vowels to surface,
cf. (23).
bã *!
ba *
bã *
ba *!
Grammar 1, which ranks the markedness constraint *VNas above the faithful-
ness constraint Ident-IO(nas), effectively prohibits a contrast between oral
and nasal vowels. This contrast is supported in Grammar 2, which has the re-
verse ranking. Which features are ‘contrastive’, and which are ‘noncontrastive’,
then depends on the ranking of specific markedness constraints and faithfulness
constraints: F » M supports contrasts, while M » F neutralises contrasts. (To
capture contextual neutralisation, as well as allophonic variation, markedness
constraints must be relativised to context.)
As compared to earlier theories (in particular, standard Generative Phonol-
ogy), OT directly encodes markedness into grammars, with markedness con-
straints constituting the substance out of which phonologies are built. Con-
sequently, the markedness (or ‘naturalness’) of phonological processes and
segment inventories need no longer be attributed to a grammar-external evalua-
tion measure, as it had been in SPE. While deviating from classical Generative
Phonology, OT is on a par with Natural Phonology (see section 1) in giving a
central function to markedness principles. OT differs from Natural Generative
Phonology, however, by taking hierarchically ranked constraints rather than
linearly ordered (natural) processes to be the core device.
22 René Kager, Joe Pater, and Wim Zonneveld
Note how the same ranking accounts for the static generalisation holding for
tauto-morphemic clusters (such as dokter). Under Richness of the Base con-
ditions on lexical representations are not required, nor can they be stated. Re-
gardless of whether input clusters have (dis-)harmonic voicing specifications,
the grammar forces their surface correspondents to be harmonic:
(25) Voicing assimilation in ‘static’ mode (tauto-morphemic context)
Input: /dɔ ktə r/ Agree-Voice Ident-IO(voice)
[dɔ kdə r] *! *
[dɔ ktə r]
Input: /dɔ kdə r/ Agree-Voice Ident-IO(voice)
[dɔ kdə r] *! *
[dɔ ktə r]
Whereas the harmonic input /dɔ ktə r/ is faithfully mapped onto a licit out-
put, a hypothetical disharmonic input /dɔ kdə r/ cannot surface unmodified
(*[dɔ kdə r]), and undergoes voicing assimilation.9 In sum, voicing assimila-
tion need not be stated twice, but acquires the status of a single grammatical
generalisation, which solves the duplication problem.
The surface-oriented architecture of Optimality Theory also offers a solution
for another problem for classical Generative Phonology, known as the con-
spiracy problem, noted in section 1. Kisseberth (1970) observed that within
grammars different rules conspire towards a common goal: they collectively
avoid a ‘marked’ pattern (for example, CCC clusters are broken up by epenthe-
sis, or reduced by consonant deletion) or establish an ‘unmarked’ pattern (for
example, syllable onsets are created by consonant epenthesis, vowel coales-
cence, etc.). Conspiracies are a problem for classical derivational phonology:
the functional unity between conspiring rules is evident, but left without any
24 René Kager, Joe Pater, and Wim Zonneveld
First, the grammar guarantees that licit outputs are faithfully mapped from
segmentally identical input forms. In such cases, the grammar functions solely
as a passive filter, licensing lexical items of the language, and allowing the lexi-
con to be productively extended by items conforming to the language’s phono-
logical requirements. When the grammar maps an input onto an unchanged
output, it performs what we may refer to as an ‘identity’ mapping. We may now
define the notion ‘possible word’ in a language as an output that, when taken
as an input, would undergo the identity mapping. Hence, to subject a form F
to the ‘possible word test’, F is submitted as an input to the grammar. If F is
mapped onto an output form F which is non-distinct from its input, F passes
the test, showing that F is a possible word of language L.10
Second, the grammar functions actively as a filtering device by prohibiting
illicit forms from surfacing. Note that the grammar does not filter out an illicit
form by ‘blocking’ (prohibiting it from appearing at the surface), but rather
by mapping it onto a modified licit form, a non-identity mapping. Submit-
ting an illicit form F to the ‘possible word’ test, we feed it into the grammar
as an input, which maps it onto an output F which is distinct from F. (In a
tableau, this shows by violation marks incurred by F on one or more faithfulness
constraints.)
There are various sources of evidence for non-identity mappings. The first,
classical type of evidence comes from automatic alternations in the shapes of
morphemes that depend on phonological context, such as alternations in voicing
of obstruents in Dutch, triggered by the markedness constraint Agree-Voice.
Another type of evidence for non-identity mappings comes from loanword
adaptation, the familiar phenomenon that words borrowed from another lan-
guage are modified so as to meet an inviolate phonological requirement of
the borrowing language. For example, English speakers tend to repair onset
clusters which are phonotactically illicit in their language, such as /kn/, by
vowel epenthesis (e.g., Evel K[ə ]nievel). Loanword adaptations, because of
their automatic, forced character and the broad consistency regarding choice of
repair strategy (e.g., deletion, insertion, featural change of segments) applied
by different speakers, give evidence for the view that it is due to an internalised
grammatical system (Hyman 1970).11
Additional evidence for the automatic nature of non-identity mappings comes
from second language phonology. For example, Dutch learners of English char-
acteristically display final devoicing, neutralising contrasts such as bit ver-
sus bid. This shows the familiar effect of transfer of the first language (L1)
into a second language (L2), in this case the M » F ranking for final devoic-
ing: *VoicedCoda » Ident-IO(voice). More strikingly, cases are known
in which L2 learners display similar mappings which apparently cannot be
explained by transfer from their native language. (See discussion in Stampe
1969, Donegan and Stampe 1979, and section 4.4 below.) For example, many
26 René Kager, Joe Pater, and Wim Zonneveld
Cases of the latter type are well known in reduplication systems (McCarthy
and Prince 1995, 1999). In CV reduplication, a typologically common type,
the affix copies a segmental portion from the base which equals a single open
syllable (CV). For example, in Nootka (Stonham 1990), the reduplicated form
či-čims-’i: ‘hunting bear’ has a CV affix [či] which copies a substring of its
base [čims’i: ]. This preference for unmarked syllable structure in the affix,
showing the activity of NoCoda (constraint M in 29), is not a general property
of Nootka, however, a language which otherwise allows for closed syllables.
This property is due to domination of NoCoda by Max-IO (constraint C in
29), which prohibits C-deletion as the general means of attaining open syllables.
The CV affix in reduplication is not due to deletion, however. Due to the copying
nature of reduplication, the affix lacks a proper lexical segmental representation.
The syllabic shape of the Nootka affix, unchecked by Max-IO, promptly
gravitates to CV to satisfy NoCoda in an emergence of the unmarked. In this
example, the role of the dominated faithfulness constraint (F in 29) is taken by
Max-BR, requiring that ‘Every segment in the Base have a correspondent in
the Reduplicant’ (McCarthy and Prince 1995).
(30) The emergence of the Unmarked in Nootka reduplication
Input: /Red-čims-’i: / Max-IO No-Coda Max-BR
(a) či-čim.s’i: ** ****
(b) čim.s’i: -čim.s’I: ***!*
(c) či-či *!***
28 René Kager, Joe Pater, and Wim Zonneveld
Computing the factorial typology for a set of constraints allows testing its
adequacy against typological evidence: in principle, every distinct ranking pre-
dicted by the factorial typology should match (part of) the grammar of some
natural language. Factorial typologies, however, grow quickly with the size
of the constraint set, for n constraints can be ranked in n! different ways.
Since the factorial typology for a set of the size of Con is huge, the task of
typologically verifying all predicted grammars poses many problems. Never-
theless, if we consider smaller sets, concentrating on a single typologically
variant property (for example, syllable typology or stress typology), factorial
typologies usually shrink to sizes small enough to allow for full typological
verification.
Consider, for example, a factorial typology of three constraints involved in
patterns of obstruent voicing. In addition to the constraints NoVoicedCoda
and Ident-IO(voice) that were discussed earlier, we assume a third constraint,
the Voiced Obstruent Prohibition (VOP), a general markedness con-
straint banning voiced obstruents across the board. There are six logically pos-
sible rankings, whose characteristic patterns are indicated. Only three distinct
patterns emerge:
All three patterns are typologically attested, in languages such as Finnish (32a),
Dutch (32b), and English (32e). The full typology of obstruent voicing pat-
terns is, of course, descriptively richer, implying that more constraints need to
be assumed than those considered here (see, e.g., Lombardi 1999). Still, this
example serves to illustrate a general point: since many of the rankings in facto-
rial typologies collapse into a single pattern, the class of typologically predicted
patterns is much smaller than the number of rankings.
Testing new constraints by calculating their factorial typologies (in inter-
action with a set of well-established constraints) is methodologically useful,
if not imperative. If constraints are universal (i.e., present and ranked in ev-
ery grammar), then adding a new constraint to the universal inventory may
increase the predicted typology. Hence, the merits of a constraint cannot be
30 René Kager, Joe Pater, and Wim Zonneveld
with the task of learning the distribution of voiced and unvoiced obstruents
in Dutch. The generalisation that holds here is that coda obstruents are de-
voiced, while elsewhere (that is, in onsets), voicing is contrastive. In Dutch,
the Voiced Obstruent Prohibition (VOP) is dominated by Ident-
IO(voice) because voicing is contrastive (in onsets). The target ranking is given
below:
(33) NoVoicedCoda » Ident-IO(voice) » VOP
The following tableau shows the relevant constraint interactions in a single
form:
(34) Coda devoicing
Input: /b ε d/ NoVoicedCoda Ident-IO(voice) VOP
(a) bε d *! **
(b) b ε t * *
(c) pε d *! * *
(d) pε t **!
Note that in the optimal candidate (34b), the coda is devoiced, while the onset
consonant is faithful to its input voicing, which shows the activity of input-to-
output faithfulness.
Let us abstract away from alternations, and assume that the learner already
knows the underlying representation /b ε d/. (The learning of alternations and
underlying representations is discussed by Tesar and Smolensky 2000, and
Hayes, chapter 5 this volume.) Under these somewhat simplified conditions,
the learning task amounts to inferring the constraint ranking (33) on the basis
of forms encountered in the input, such as [b ε t].
The learning process starts from an initial ranking in which all three con-
straints cluster together in a single stratum. The assumption of a one-stratum
initial state will be reconsidered in the next section, in favour of an initial state
in which markedness constraints outrank faithfulness constraints, as proposed
in Smolensky (1996a, 1996b).
The learner encounters the first datum: [b ε t], which (as we assumed earlier)
(s)he knows is based on the underlying representation /b ε d/. Since the observed
output form [bε t] must be optimal for the given input, any other candidates for
the same input can be safely assumed to be less harmonic, that is, sub-optimal.
She starts by arranging the information to be processed by the constraint ranker
in the form of mark-data pairs, consisting of pairwise comparisons of the
optimal candidate (the winner) and a sub-optimal candidate (a loser):
32 René Kager, Joe Pater, and Wim Zonneveld
Shared constraint violations (between a winner and a loser) are cancelled out.
Consequently, the second mark-data pair ceases to be informative, as [p ε d]
contains a superset of violations of [bε t], which cannot (by definition) give any
information about ranking. New mark-data pairs are drawn up, including (all
and only) relevant and non-redundant information:
(36) Mark-data pairs after marks cancellation
sub-opt ≺ opt loser-marks winner-marks
For example, if the learner starts with the second mark-data pair in (36),
working from a single-stratum initial state, (s)he can safely decide to demote
VOP below Ident-IO(voice), placing it into a new stratum.
(37) {NoVoicedCoda, Ident-IO(voice)} » VOP
Continuing with the first mark-data pair, the learner is faced with a dilemma:
she can demote Ident-IO(voice) below NoVoicedCoda, or alternatively,
demote it below VOP:
Starting with Prince and Smolensky (1993), phonological research has turned
up a number of cases of non-uniform constraint application, and provided com-
pelling analyses in terms of ranked constraints. Evidence of non-uniform con-
straint application in child language, and the accompanying analysis in terms of
ranked constraints, yields an important formal parallel between phonological
theory and child phonology, and a strong argument for an Optimality theoretic
approach to the latter domain.
Two such cases are presented in Amalia Gnanadesikan’s chapter in this vol-
ume, ‘Markedness and faithfulness constraints in child phonology’. Here we
shall present the simpler of the two so as to provide an explicit example of
the role of ranked constraints in child phonology; we take some liberties with
Gnanadesikan’s analysis for the sake of expository ease. The evidence for non-
uniformity comes from the activity of constraints against high sonority onsets in
the forms produced by an English-learning child. In cluster reduction, we find
that the sonority of the segments determines which consonant is deleted, with
the higher sonority segment being lost. For example, a stop-liquid cluster will
lose the liquid, rather than the stop (e.g., [piz] please). This can be attributed to
a constraint against liquid onsets (for our purposes *L-Ons). Outside of clus-
ter reduction, however, approximant onsets freely occur (e.g., [læb] lab). This
would immediately raise a paradox in a theory of inviolable constraints: how
could a constraint that is active in cluster reduction be violated elsewhere? In
OT, the answer would be that a higher ranked constraint usually forces *L-Ons
to be violated, but that this constraint does not interfere with the satisfaction of
*L-Ons in cluster reduction.
One such constraint is Max-IO,the faithfulness constraint that blocks seg-
mental deletion by requiring every input segment to have an output correspon-
dent (McCarthy and Prince 1999). The tableau in (39) illustrates the effect of
ranking this constraint above *L-Ons when the input onset is a singleton:
As this tableau shows, the dominance of Max-IO generally rules out deletion
as a means to satisfy *L-Ons. Cluster reduction is different because in this
context, Max-IO must be violated, due to the dominance of a constraint against
clusters, *Complex. Regardless of which consonant is deleted, Max-IO
will be violated, so the decision is passed down to the lower ranked *L-Ons
constraint:
Introduction 37
(41) Until 1:9.9 adult target onsets other than plosives are either realised
with an initial plosive . . . or deleted by Jarmo
markedness constraints, and these form possible intermediate grammars for the
Dutch-learning child. Levelt and van de Vijver show that only a small number
of such possible intermediate stages are attested in a corpus of data on the ac-
quisition of Dutch. They argue that the limited range of developmental paths
can be explained by considering the frequency of the various syllable types in
caretaker language. The transition between developmental stages involves de-
motion of a markedness constraint beneath faithfulness. When there is a choice
of markedness constraints to be demoted, it is made on the basis of the frequency
of the syllable type that will be added to the child’s repertory.
necessary assumption, however; one might also claim that constraints emerge
in acquisition in response to articulatory and perceptual pressures (see, e.g.,
Bernhardt and Stemberger 1998, Boersma 1998, Hayes 1999). The universality
of such phonetic and cognitive factors would then be held to explain the ob-
served activity of similar constraints across languages, and across developing
grammars. It is difficult to tease these innatist and emergentist accounts apart
empirically in terms of their predictions about child language. One source of
evidence in favour of an (at least partially) emergentist stance may be the oc-
currence of phenomena in child speech that are unattested typologically. The
prototypical case is that of long-distance assimilation of primary place features
between non-adjacent consonants, usually referred to as consonant harmony.
Given that the pattern is unattested typologically, it would seem unlikely that it
is produced by typologically derived constraints. However, Optimality theoretic
analyses of consonant harmony have diverged on this issue; while Pater (1997)
takes this as evidence for a child-specific articulatorily based constraint, Goad
(1997) constructs an analysis using Alignment constraints that are claimed
to be active cross-linguistically (see also Levelt 1995, Dinnsen, Barlow, and
Morrisette 1997, Bernhardt and Stemberger 1998, and Rose 2000 on consonant
harmony in Optimality Theory).
Innatist and emergentist theories appear to make different predictions about
the nature of constraints. Under the emergentist view, the constraints should
rather directly mirror the articulatory and perceptual factors on which they are
based, while an innatist theory would expect that at least some constraints (if
not all) should be purely formal in nature, with no direct phonetic motivation
(see Boersma 1998 for discussion). The innatist perspective is defended in
Heather Goad and Yvan Rose’s chapter, ‘Input elaboration, head faithfulness,
and evidence for representation in the acquisition of left-edge clusters in West
Germanic’. They take the sonority-based analysis of onset reduction put forth
by Gnanadesikan and others as representative of a relatively phonetically based
approach to the phenomenon. They point to another pattern of onset reduction
attested in child speech that they term the ‘head pattern’, which differs from
the sonority pattern in that [s]-initial clusters always lose the [s], even when the
second member of the cluster is higher in sonority. They argue that an account
of this pattern requires a structurally elaborated syllable structure that encodes
the difference between all other obstruent-initial clusters and [s]-initial clusters:
the former are left-headed branching onsets, while in the latter, [s] is analysed
as an adjunct, and head status is trivially assigned to the consonant following it.
Faithfulness to the head position favours the preservation of the second member
of [s]-initial clusters, but the initial member of all other obstruent initial clusters.
Since headedness is in this context a purely formal, rather than functional,
principle, Goad and Rose take this to suggest that the substantive content of
constraints, and the representations they refer to, is not just functional.
40 René Kager, Joe Pater, and Wim Zonneveld
This was confirmed by the Headturn Preference Procedure for children at 4.5,
10, and 20 months, although no significant difference was found at 15 months.
Joe Pater’s chapter ‘Bridging the gap between receptive and productive de-
velopment with minimally violable constraints’ draws on previously published
studies of infant speech perception to make the case that receptive acquisition
follows a course similar to that of productive development: initial stages permit
only unmarked structures; more complex structures emerge later. To account
for these parallels, Pater develops a model in which markedness constraints
apply in perception as well as in production. Since receptive development does
typically precede the development of production, it is necessary to allow for
differences in the complexity of structures permitted at a single time. Pater’s
proposal is that faithfulness constraints can be indexed to perception or pro-
duction, thus allowing for a situation in which perceptual representations are
of greater complexity than those created for production.
error-driven (Tesar and Smolensky 1998), this ranking would be a trap, and the
learner would not be guaranteed to converge on the correct NoCoda » Dep
hierarchy. This can be seen as an instance of the Subset problem discussed in
Principles and Parameters theories (e.g., Berwick 1985); the language produced
by M » F (e.g., NoCoda » Dep; with only V rimes) is a subset of the language
produced by F » M (e.g., Dep » NoCoda; V and VC rimes). Since all of the
data of the subset language are consistent with the superset language, positive
evidence alone will not move the learner out of the superset state.
The solution to this problem suggested by researchers such as Demuth (1995),
Gnanadesikan (1995, this volume), Levelt (1995), and Smolensky (1996a,
1996b) is to posit an initial state in which all of the markedness constraints
outrank the faithfulness constraints. Positive evidence will be available for any
Faithfulness » Markedness rankings that are inconsistent with this initial state.
to establish whether Dutch 3- and 4-year-olds have mastery of the Dutch stress
system. Based on elicited production data of real and nonsense words, they find
that the answer to this question seems to be an affirmative one, and also that
a developmental pattern can be detected from one age group to the next. The
chapter goes on to show, however, that some subtle patterns in these experimen-
tally collected data are not captured by any existing standard analysis, whether
formulated in rules, parameters, or constraints. It is indicated that, by hindsight,
these patterns occur in the adult system too, and an analysis is proposed with
two properties: irregularity is treated with the aid of deviating hierarchies; and
in some of these hierarchies constraints become visible only because they are
active in subpatterns first discovered in the child language experiment. In par-
ticular, Zonneveld and Nouveau uncover evidence in their experiment for an
undominated *Clash constraint, whose activity is generally masked by other
constraints.
Shigeko Shinohara’s chapter, ‘Emergence of Universal Grammar in foreign
word adaptations’, presents a study of the adaptation of French loanwords
by Japanese speakers. Shinohara discusses patterns of segmental change and
insertion as well as accent placement. Some of the phenomena point to the
activity of constraints that govern the phonology of the language as a whole,
but others, such as avoidance of stressed epenthetic vowels, and stem–syllable
alignment, implicate constraints that are uniquely active in the loanword phonol-
ogy. Since these constraints do have considerable cross-linguistic justification,
Shinohara takes them to be part of the constraint set supplied by Universal
Grammar. For the most part, their activity in loanword adaptation can be un-
derstood as the emergence of latent M » F rankings, but some ranking among
the markedness constraints is also required. These rankings of markedness
constraints, Shinohara suggests, are potentially universal, and derivable from
phonetic scales (see Prince and Smolensky 1993: ch. 5).
Davidson, Jusczyk, and Smolensky present an experimental paradigm that is
designed to induce speakers to subject non-native inputs to their English gram-
mars, to quantitatively assess the M » F-based prediction that non-English clus-
ters would be repaired to meet the requirements of English syllable structure.
In the condition that best approximated this prediction, non-English clusters di-
vided into several groups which could be ordered according to their probability
of repair. They show that appropriate interaction of constraints independently
needed to bar non-English clusters can account for the relative markedness of
different non-English clusters, suggesting a final English ranking that makes
such distinctions, without apparent motivation in the English data in which none
of these clusters appears. They go on to suggest possible analyses of how such
a ranking might arise, and point out the importance of such ‘hidden rankings’
in the final state for pursuing the hypothesis that the initial state for second
language acquisition is the final state for first language acquisition.
Introduction 45
5. Conclusion
In this introduction, we have emphasised the theme of formal and substan-
tive connections between phonological theory and child phonology. Across
theories, a basic formal connection is made by positing mappings between un-
derlying and surface representations in child phonology, analogous (but not
necessarily equivalent) to such mappings in phonological theory. In rule-based
theories this connection is strengthened by arguing that the rules that perform
the mappings in both domains are similar in terms of their formal makeup, as
well as in how they interact with one another, specifically, through ordering
(Stampe 1969, 1973a, 1973b, Smith 1973). Similarly in OT, constraints are
argued to interact through ranking in child language as well as in mature gram-
mars. Substantive connections between child and adult phonology were made in
rule-based phonology by positing a set of processes that apply in both domains
(Stampe 1969, 1973a, 1973b), and in constraint-based phonology, by having
constraints that apply to child and adult grammars. Many of the same issues
that confronted earlier attempts to connect child phonology and phonological
theory continue to apply today; this is particularly obvious from a reading of
Lise Menn’s contribution to this volume, ‘Saving the baby: making sure that
old data survive new theories’. At the same time, however, we should not un-
derestimate the progress that has been made on several fronts. Along with the
continued discovery of basic formal and substantive parallels between child and
adult phonology, recent research has succeeded in providing explicit proposals
about difficult issues such as the learnability of phonology, variation, similari-
ties and differences in comprehension and production, and the genesis of con-
straints. The rapid progress that is being made, combined with the wide range
of issues that remain to be explored, makes the intersection between phono-
logical theory and phonological acquisition such an exciting area for ongoing
research.
notes
1. Those wishing to expand their knowledge of its subject-matter beyond what is dis-
cussed in this section, and/or to familiarise themselves with the view of others on
similar material and issues, may wish to consult a number of other texts: out of some
handfuls we recommend Ingram (1989), Ferguson, Menn, and Stoel-Gammon (1992),
Fletcher and MacWhinney (1995), Vihman (1996), Jusczyk (1997), Bernhardt and
Stemberger (1998), and Tesar and Smolensky (1998, 2000).
2. First published in German in Uppsala, Sweden, when Jakobson was in Norway as a
World War Two refugee; this version was reprinted in 1962 in Selected Writings I.
An English translation, from which we quote here, appeared in 1968.
3. The expected value of the opposition is often called the ‘unmarked’ one, the un-
expected value the ‘marked’ one (in this example: stop is unmarked, fricative is
marked).
46 René Kager, Joe Pater, and Wim Zonneveld
4. Kaye (1974) argues that some cases of opaque rule interaction may be interpreted
as functionally motivated in that they contribute to the recoverability (in a technical,
non-learning sense) of underlying representations, namely if the opaque derivation
produces a segment that occurs nowhere else in the language, as when [ŋ] is the
unique product of assimilation > velar deletion. Notice, however, that this is not a
case of counterfeeding. For further discussion, see McCarthy (1999a: section 3.1).
5. Smith does not literally claim that underlying forms of the child grammar will by
definition be always identical to the adult form: empirical evidence may suggest
otherwise. Consider the rule of Consonant Harmony, whereby a coronal consonant
becomes velar under the influence of a velar later in the word; [εi ] for taxi,
[ɔ k] for talk. The rule also neutralised take and cake, for instance. When the rule
disappeared, in ‘hundred or more examples’ (p. 144) the completely regular coronal
appeared, as expected; as a single exception, the verb take remained [eik], later
[kh eik]. What apparently had happened was that Amahl assumed the underlying
form of this verb to be /keik/, only turning to a different assumption, leading to the
correct output, in the face of consistent positive evidence.
Recently, and more fundamentally, Macken (1995) distinguishes between three
acquisitional rule types (related to perception, articulation, and generalisation) that
each have their own functional and developmental characteristics. Her model allows
for underlying representations which are the same as surface representations in cases
of perception-based neutralisation, which is typically eliminated slowly and word by
word. Interestingly, Macken (1980) eliminated Smith’s velarisation rule ( = 5a) in
this latter manner.
6. See, e.g., Anderson (1985: 342 ff.) for a discussion and an assessment; some of NGP’s
leading ideas have resurfaced in a much different form in OT.
7. For general introductions to OT, we refer to Archangeli and Langendoen (1997),
Kager (1999), and McCarthy (2001).
8. The duplication problem was recognised as early as SPE (Chomsky and Halle
1968: 382): ‘Thus certain regularities are observed within lexical items as well as
across boundaries – the rule governing voicing in obstruent sequences in Russian,
for example – and to avoid duplication of such rules in the grammar it is necessary
to regard them not as redundancy rules but as phonological rules that also happen
to apply internally to a lexical item.’ This solution to the duplication problem, to
order morpheme structure rules among the other phonological rules, was shown to
be insufficiently general by Kenstowicz and Kisseberth (1979), for the reason that it
cannot handle cases in which a phonological rule is blocked from applying because
its output would violate a static condition on the lexicon. After discussing examples
from Russian and Tonkawa, Kenstowicz and Kisseberth (1979: 433) conclude: ‘In
both cases a constraint on UR affects the application of a phonological rule – by
adjusting the output of the rule in one case and by preventing application of the rule
in the other. An ordering solution works for the former but not the latter. Thus, the
ordering solution cannot be accepted as a totally general solution to the duplication
problem. It would seem that once a way is found to express the conspirational
relation between MSRs and the application of phonological rules in examples such
as the Tonkawa one, the duplication involved in examples such as Russian should
fall out as a special subcase.’ This sketches in essence the approach taken in OT:
surface phonological constraints account for generalisations on lexical items, and also
Introduction 47
function to trigger and block phonological changes with respect to the input – that
is, alternations.
9. There is one case which is not covered by a possible word test based on identity
mapping. Chain shifts (see Kirchner 1996 for an OT analysis) are mappings in which
an input /A/ is mapped onto [B], while input /B/ maps onto [C]. If ‘B’ undergoes
the test, it will change, and hence fail the test; however, [B] is also the legitimate
output of the mapping /A/ ⇒ [B]).
10. The fact that Dutch phonology has an alternative way of repairing the input /kd/ (by
regressive voicing assimilation) into [d] is beside the point: both mappings involve
a ranking indicated in (25).
11. In the OT literature, loanword adaptations have been argued to bear on various issues,
such as the universality of markedness constraints, as opposed to the language-
specific nature of rewrite rules. References include Yip (1993), Itô and Mester
(1995), Paradis (1996), Paradis and LaCharité (1997), Gussenhoven and Jacobs
(2000), LaCharité and Paradis (2000), and the contributions in this volume by
Shinohara, and Davidson, Jusczyk, and Smolensky.
12. For another approach to the issue of the learnability of OT grammars, see Pulleyblank
and Turkel (1998, 2000).
13. See also Demuth (1997), Curtin (2002), Curtin and Zuraw (forthcoming), and
Pater and Werle (2001) on the use of these models to deal with variation in
acquisition.
14. See Reynolds (1994) for a slightly different view of variation in OT.
References
Anderson, S. R. (1985). Phonology in the Twentieth Century. Theories of Rules and
Theories of Representations. The University of Chicago Press.
Anttila, Arto (1997). Variation in Finnish Phonology and Morphology. Ph.D. disserta-
tion, Stanford University.
Archangeli, D. (1997). Optimality Theory: an introduction to linguistics in the 1990s.
In D. Archangeli and D. T. Langendoen (eds.) Optimality Theory: an Overview.
Malden, Mass.: Blackwell Publ. 1–32.
Archangeli, D. and D. T. Langendoen (eds.) (1997). Optimality Theory: an Overview.
Oxford: Basil Blackwell.
Archangeli, D. and D. Pulleyblank (1994). Grounded Phonology. Cambridge, Mass.:
MIT Press.
Barlow, J. (1997). A Constraint-Based Account of Syllable Onsets: Evidence from
Developing Systems. Ph.D. dissertation, Indiana University.
Barlow, J. and J. Gierut (1999). Optimality Theory in phonological acquisition. Journal
of Speech, Language, and Hearing Research 42. 1482–1498.
Beckman, J. (1998). Positional Faithfulness. Ph.D. dissertation, University of
Massachusetts.
Benua, L. (1997). Transderivational Identity: Phonological Relations Between Words.
Ph.D. dissertation, University of Massachusetts. [ROA-259] http://roa. rutgers.edu
Bernhardt, B. H. and J. P. Stemberger (1998). Handbook of Phonological Development.
San Diego: Academic Press.
48 René Kager, Joe Pater, and Wim Zonneveld
Velleman, S. L. and M. M. Vihman (2000). The optimal initial state. Paper read at the
Annual Meeting of the Linguistic Society of America, Chicago, January.
Vihman, M. M. (1996). Phonological Development: the Origins of Language in the
Child. Cambridge, Mass.: Blackwell.
Yip, M. (1993). Cantonese loanword phonology and optimality theory. Journal of East
Asian Linguistics 2. 262–291.
Zoll, C. (1998). Positional asymmetries and licensing. MS., Cambridge, Mass.: MIT.
2 Saving the baby: making sure that old data
survive new theories*
Lise Menn
1. Introduction
At the peak of the influence of Generative Phonology, Charles Ferguson and
Carol Farwell of the Stanford Child Phonology Project published a troubling
observation (Ferguson and Farwell 1975), based on longitudinal data: they
showed that the construct of ‘phoneme’ did not (indeed, does not) do justice
to the patterns of variation in some young children’s production of speech
sounds (see also Menyuk et al. 1986). But data which, like these, do not fit any
recognisable theory are like a strange tool that comes without instructions, or
an unfamiliar spice for which one has no recipe. They stay in a box, untouched.
Perhaps they are retrieved from storage when some use for them comes along;
but more likely, they are only noticed lurking there after something similar has
been rediscovered somewhere else, at a time and place where the odd item can
at last be assimilated (cf. the rediscovery of Mendelian heredity).
Once we do have a new beautiful theory that can handle data which were
previously intractable, we tend to go on a binge: we try to show that it can do
almost everything. That is good – we have to check out its power. But we also
tend to shove all the data that do not fit the new theory into that storage box,
whether or not they were well handled by a previous approach.
What every field needs, I think, is an inventory – a kind of second-order
data base. What old unassimilated results are in the storage box? What’s been
shoved back in there recently? Let’s bring them out and highlight them as
exhilarating challenges, like David Hilbert’s famous list of unsolved problems
of mathematics, or the Guinness list of world records. Reality is richer and
more complex than any of its approximations; for the scientist, as well as for
the developing child, glee at what we have just learned to do soon has to be
replaced by a new round of efforts to approximate the complex reality-out-there.
It is, of course, not the case that a theory needs to explain all the phenomena
related to its domain. I think that a number of the properties of early phonology
which I shall describe are probably not data that Optimality Theory (OT; Prince
and Smolensky 1993) should try to account for – for example, the problem that I
started out with, of speech sounds that fail to be phonemes in any normal sense,
54
Making sure that old data survive new theories 55
because they are conditioned by the lexical items they appear in rather than
by any phonological property of those items. (This behaviour is rare but not
unknown for adult language – consider the huge range of phonetic variation of
the word no vs. the smaller range of variation of the word know in its normal uses
vs. the different set of variations in the filler phrase y’know.) But by bringing
my inventory to the attention of current OT theorists, I hope to encourage
metatheoretical reflection: what is OT about, what can and can’t it account for,
and why can it account for some things but not others? In other words, what
is the proper domain of OT, and what should people be using other tools to
account for?
This chapter presents an overview of the history of the study of constraints in
child phonology, and then a start at a Historical Annotated Inventory of Things
We Know About Child Phonology. I shall pay attention both to the necessity
for a constraint-based theory like OT, and to the types of data that I think
currently pose challenges to this powerful new paradigm. As I am not active in
this rapidly developing area, I am sure that some of the problems I raise will
have been dealt with – perhaps in several ways – since the publication of the
OT papers I have been able to study. But what I have been requested to do is to
provide a historical perspective on OT as a theory of phonological development
for this volume, and as a member of the second generation of research in child
phonology (principally, in the US, the students of Charles Ferguson and Paula
Menyuk), I have been around long enough to do that. Again, I am not an OT
expert; if the theory has already been enriched so that it can deal with some of
the issues I raise, so much the better for OT!
The big problem in dealing with phonotactic constraints was, as I have in-
dicated, not the notion or the motivation, but the formalism. The rewrite rules
of generative grammar became available to phonology with the publication
of Halle (1959), and swept over the linguistic world much as OT has done.
However, while rewrite rules can capture roughly the same regularities as out-
put constraints, their focus on input-and-change makes the constraints them-
selves hard to see. Jakobson (1941/1968), the first linguistic theorist to take
child language seriously, did not attempt to deal with constraints beyond the
level of phoneme inventory; however, his student Lawrence Jones published an
approach to a generative phonotactics of English called ‘English phonotactic
structure and first-language acquisition’ (the invocation of language acquisition
in the title was purely speculative). Jones’s approach was to capture the struc-
ture of English words by generating them from a ‘Word’ node using rewrite
rules, just as generative syntax captured sentence structure by rules starting
from an ‘S’ node (Jones 1967). I happened to hear Jones lecture on this work
at about that time, while I was analysing the ‘Daniel’ acquisition diary data,
and I was tremendously excited. With his help, I cast my data in his formal-
ism, thereby producing the rule-jungle called ‘Phonotactic rules in beginning
speech’ (Menn, 1971).
However, Jones’s formalism was a methodological dead end, because it was
an intuition-killer. First, it was just as difficult to get a sense of what the output
constraints were like as it is with any other rewrite rule formalism; the output
constraints still are formal epiphenomena. Second, word-generation rules are
also no help in showing the relation between underlying morphemes and output
phonemes, which is the other thing we want to know about. Capturing that
relation requires morpheme-based rules, such as the rewrite rules of classical
generative phonology.
Nevertheless, Jones’s approach did call attention to the importance of con-
straints on the output form of the word. With its help, one could capture the
overall difference between the child’s word and its adult model like this: the
child’s output word-grammar contains much less information than the adult
form, in that there are fewer choices among phonemes, fewer phonemes per
word, and fewer admissible sequences of phonemes. The child form has less to
specify, less to remember, fewer articulatory movements to control.
This was a move towards explanation in less grand terms than Jakobson’s –
a common-sense (or, if you prefer, empiricist) reaction to ‘Laws of Irreversible
Solidarity’. (It also reflected my caution as a pre-doctoral researcher who was
working with a one-child data set.) No innate knowledge or pattern was invoked;
constraints were seen as the outcome of a limited ability to process articulatory
instructions. The information-theoretic formulation also avoided appealing to
‘ease of articulation’ explanations, because after all, we had no independent
Making sure that old data survive new theories 57
evidence that what the child said was ‘easier’ than the adult model (indeed,
we still have none). On the contrary, what we needed to ask was: what kind of
creature is the child such that guck may be easier than duck? And why do some
children prefer dut, others da, etc.? This is the task that I think the field is still
about.
Meanwhile, David Ingram (1974), working with Charles Ferguson, devel-
oped a much more tractable approach to child phonology, dealing with the
constraints in terms of Stanley’s morpheme structure conditions (1967). After
his example, child phonologists used rules only to derive child forms from adult
ones. (For more details, see Vihman 1996: chapter 2.) And the field soon had
an important caution about constraints to bear in mind: the fact that not all
of a child’s rules at a given time can be accounted for by constraints that are
synchronically valid. Some of the rules are needed to capture held-over patterns
of dealing with adult sounds that were established earlier in the child’s devel-
opment. A famous case is the puzzle-puddle-pickle story from Smith (1973),
reanalysed brilliantly by Macken (1980); in Pater’s OT bibliography (2000) an
unpublished paper by Dinnsen et al. (2000) is cited that now attempts to deal
with this phenomenon in OT.
Rules that are not motivated by synchronic constraints thus pose an impor-
tant challenge to all constraint-based approaches. Indeed, the constant changes
in a developing phonology create problems that even a typical rule-based ap-
proach cannot handle without much ad-hoc machinery. Smith, Macken (op.
cit.) and many others followed mainstream Generative Phonology in assuming
a single underlying form; for child phonology, this was taken to be approx-
imately equal to the adult surface phonemic form. The rules then performed
an on-line mapping from adult to child form. Such a model is able to cap-
ture the regularities of mappings from adult surface form to child’s produced
form, but it is unable to capture output constraints – also, it cannot account for
phonological idioms and other lags in updating the mapping. It is also unable to
handle ‘cross-talk’ between forms (rule interaction), as is true for any generative
model.
To handle phonological idioms and other rule-updating lags, I developed a
‘two-lexicon’ model, articulating it with help from Paul Kiparsky (Menn 1976,
Kiparsky and Menn 1977). Such models assume a stored input representation
which is close to the adult surface form, plus an output representation which
is derived from the input form, but also stored. (The term ‘two lexicons’ is
slightly misleading, since only a single semantic representation is postulated.)
The stored output form is accessed on-line, and updated from the input form ir-
regularly. The two-lexicon model yielded some improvement: in addition to cap-
turing the regularities of mappings from adult surface form to child’s produced
form and lags in updating the mapping, it provided a conceptual framework
58 Lise Menn
Child speech, from the possibly prephonemic early stage and on through perhaps the
first nine months of speaking, is subject to severe output constraints, stronger than
anything found in adult phonology . . . Rewrite-rule notation can handle such phenomena
(Langendoen 1968), but only by brute force (Clements 1976), not perspicuously . . . This
is the area in which generative phonology was seriously deficient with respect to British
(Firthian) prosodic phonology, and where Waterson’s 1971 prosodic approach to child
phonology yielded important insights: For example, that the child might be seeking out
whole-word patterns in the adult language that were similar to whole-word patterns that
the child had learned to produce. (p. 31)
. . . the child’s ‘tonguetiedness’, that overwhelming reality which Stampe and Jakobson
both tried to capture with their respective formal structures, could be handled more
felicitously if one represented the heavy articulatory limitations of the child by the
formal device of output constraints . . . The child’s gradual mastery of articulation then
is formalized as a relaxation of those constraints . . . (pp. 35–36)
. . . it will help to develop some terms for dealing with sets of rules which appear to serve
some common function. Suppose none of the forms produced by a child contain conso-
nant clusters, or that none have final stops, or that none have disharmonic [consonant]
sequences. A statement that a particular sound-pattern does not appear in a corpus and
is not expected to appear if we get a larger sample is a statement of an output constraint.
Adult languages have output constraints as well; consonant clusters are absent from
many languages, and every language has restrictions on how many and what kind of
consonants form a pronounceable cluster (Bell 1971). Vowel harmony, present in quite
a number of languages, is also describable as an output constraint.
60 Lise Menn
Following Kisseberth (1970), when we have a set of rules that all contribute to elim-
inating sound patterns which would violate a particular output constraint, we say that
those rules form a conspiracy . . .
Conspiracies of rules are not the only devices that children use to maintain output
constraints, however. Selection strategies may also contribute – children may avoid adult
words which violate a constraint . . . (Menn 1983: 16–17)
One might think that this was merely a way of avoiding commitment to
the construct of output constraint, since it was not yet widely accepted and
it belonged to no phonological theory then articulated. But there were some
properties of the data that the construct could not handle, and which are still
problematic today. One is the idiosyncratic individuality of some of the con-
straints which seem to develop from favourite babble patterns (Vihman 1996).
Another is the fact, already noted, that some of them may not be present at the
onset of speech. This delayed onset of presumably innate behaviours was also
a problem for Stampe’s notion of innate natural processes. And so I added:
. . . one cannot explain lexical exceptions [to rules] or overgeneralizations within a theory
[which holds] that the acquisition of phonology is purely a matter of overcoming output
constraints, as I might have tempted you to think . . . Such a theory would be subject to
exactly the same inadequacies as Jakobson’s in these cases . . . if we want a functional,
explanatory theory . . . , we need a theory which is more complicated. (pp. 28–29)
To be more explicit: there are standard hard problems for theories that operate
only with across-the-board changes – whether these are conceptualised as the
acquisition of phonemic contrasts, the suppression of natural processes, or the
re-ordering of constraints. One of these is accounting for lexical exceptions to
general rules/constraints, like progressive phonological idioms (special cases
of words that violate otherwise general constraints) and regressive overgeneral-
isation (words that escape a constraint and then later become subject to it). How
can a constraint be innate if it neither applies from the beginning nor marks a
maturational advance?
A related hard problem is how to handle fossils (regressive phonological
idioms), that is, forms that do not ‘update’, but instead maintain an older
Making sure that old data survive new theories 61
mapping from the adult form to the child form, even though new forms are
being made with a newer mapping. For example, a child whose newer vocabu-
lary shows that she has learned to overcome consonant harmony may still use
harmonised forms for older words – that is, the words that she began to use
before she was able to violate that constraint. Again, this was the problem that
stimulated the development of a ‘two-lexicon’ model; either the output forms
or the mapping rules that derive them must be stored.
So much for history; for those who wish to look further into the substantial
older literature on child phonology, I suggest starting with the handbook chap-
ters by Gerken (1994) or Menn and Stoel-Gammon (1995), and then for a more
comprehensive look at pre-OT work, Yeni-Komshian et al. (1980), Ferguson
et al. (1992), and Vihman (1996), and going back to the primary sources cited
therein. For emergence of prosody in production, see Snow and Stoel-Gammon
(1994) and papers by Gerken and colleagues (e.g., Gerken and McIntosh 1993,
Gerken 1994). For more recent work in connectionist modelling approaches
to OT, see Stemberger and Bernhardt (1999); for other computational mod-
elling approaches, see Lindblom (1992), Plaut and Kello (1999), Gupta and
Dell (1999), and Markey (1994). A review emphasising both constraints and
autosegmental analysis is Bernhardt and Stemberger (1998).
(fricatives became stops if attempted). *Note again that constraints like this are
not found in adult languages; this may be a problem for OT.
except for children who are conspicuously late talkers (Stoel-Gammon 1989;
Vihman et al. 1994, Vihman 1996). This should not be a problem for OT if
ranking of constraints can be carried out during babble – that is, before the
child has specific words as output targets.
mappings, and by rule cross-talk. Possibly more tractable are the issues of con-
straints that operate across word boundaries, the need to explain the probability
envelope of variation across individuals, and the need to differentiate ‘input’ in
the sense of ‘the adult form the child is trying to approximate’ from ‘input’ in
the sense of ‘abstract strings of concatenated morphemes to be parsed’.
However, if OT does not claim to be a Theory of Everything, there is no reason
why it should be expected to meet all these challenges. The solutions to many of
them probably lie outside the scope of OT. I do not think that it is likely that any
single unified theory will be able to handle all the phenomena of the acquisition
of phonology; expecting this would be an impossible standard for OT or any
of its competitors. Furthermore, there is no reason to believe that concepts
developed to handle only adult phenomena will be adequate for describing
acquisition; the unskilled speaker is not just a miniature of the skilled speaker.
The stance taken by Ferguson and Farwell (1975: 437) is no less valid today:
‘Our approach is to try to understand children’s phonological development in
itself in order to improve our phonological theory, even if this requires new
constructs for the latter.’
Instead of a single unified theory, I suggest that several good partial theo-
ries (or models or accounts) are needed, each handling various aspects of the
phenomena of our Inventory. Each of these theories/models/accounts should be
psychologically responsible; by this, I mean that they should be mappable into
psycho-linguistic concepts, for at some level the psycho-linguistic constructs of
auditory similarity, articulatory control, memory, maintenance of distinction,
generalisation, and information flow are the primitives of phonological expla-
nation. Below these, in turn, lie the mechanics of neural computation, sensory
response, and muscle innervation; it must be at this level that the probabilities
of finding the various types of constraints and mappings are determined.
It may be that some facts about phonology are indeed not explainable in
such psychological terms, but to concede this until one is forced to do so is a
counsel of despair. Calling a rule ‘natural’ or a constraint ‘innate’ is like calling
a hurricane ‘an act of God’; having said so, one knows no more than before
about how to predict it or to understand its causes. We can do better than that.
Acknowledgements
*I am grateful to my students George Figgs, Holly Krech, Matthew Maraist,
and Hiromi Sumiya for their thoughtful discussion of many issues raised above;
to Shelley Velleman and Marilyn Vihman for sharing the manuscript of their
Linguistic Society of America (LSA) paper; to Shelley Velleman and LouAnn
Gerken for comments on an early draft of this paper; to Bruce Hayes for making
a draft of his chapter for this volume available on the web; and to Bill Bright,
as always, for editorial advice throughout.
Making sure that old data survive new theories 69
References
Anttila, A. (1997). Deriving variation from grammar. In F. Hinskens, R. van Hout
and L. Wetzels (eds.) Variation, Change and Phonological Theory. Amsterdam:
Benjamins.
Bell, A. (1971). Some patterns of occurrence and formation of syllable structures. In
Working Papers on Linguistic Universals 6. 23–137. Stanford University Linguis-
tics Department.
Bernhardt, B. H. and J. P. Stemberger (1998). Handbook of Phonological Development.
San Diego: Academic Press.
Boersma, P. and B. Hayes (1999). Empirical tests of the gradual learning algorithm.
[ROA 348, http://roa.rutgers.edu] Published version, (2001) LI 32. 45–86.
Boersma, P. and C. Levelt (2000). Gradual constraint-ranking learning algorithm predicts
acquisition order. Proceedings of Child Language Research Forum 30. 229–237.
Stanford University.
Chomsky, N. and M. Halle (1968). The Sound Pattern of English. New York: Harper
and Row.
Clements, G. N. (1976). Vowel harmony in non-linear generative phonology. MS.,
Linguistics Department, Harvard University.
Dinnsen, D. A., K. M. O’Connor, and J. A. Gierut (2000). An optimality-theoretic
solution to the puzzle-puddle-pickle problem. MS., Indiana University.
Donahue, M. (1986). Phonological constraints on the emergence of two-word utterances.
Journal of Child Language 13. 209–218.
Dresher, B. E. and H. van der Hulst (1995). Global determinacy and learnability
in phonology. In J. Archibald (ed.) Phonological Acquisition and Phonological
Theory. Hillsdale, N.J.: Erlbaum. 1–22.
Edwards, M. L. and L. Shriberg (1983). Phonology: Applications in Communicative
Disorders. San Diego: College-Hill Press.
Ferguson, C. A. and C. B. Farwell (1975). Words and sounds in early language acqui-
sition. Lg 51. 419–439.
Ferguson, C. A., L. Menn, and C. Stoel-Gammon (eds.) (1992). Phonological Develop-
ment: Models, Research, Implications. Timonium, Md.: York Press.
Gerken, L. (1994). Child phonology: past research, present questions, future directions.
In M. A. Gernsbacher (ed.) Handbook of Psycholinguistics. San Diego: Academic
Press. 781–820.
Gerken, L. and B. J. McIntosh (1993). The interplay of function morphemes and prosody
in early language. Developmental Psychology 29. 448–457.
Gupta, P. and G. S. Dell (1999). The emergence of language from serial order and pro-
cedural memory. In B. MacWhinney (ed.) The Emergence of Language. Hillsdale,
N.J.: Erlbaum. 447–482.
Halle, M. (1959). The Sound Pattern of Russian. The Hague: Mouton.
Hayes, B. (1999). Phonological acquisition in Optimality Theory: the early stages. This
volume.
70 Lise Menn
(1976). Pattern, Control, and Contrast in Beginning Speech: a Case Study in the
Development of Word Form and Word Function. Ph.D. dissertation, University of
Illinois, Urbana. [Distributed by Indiana University Linguistic Club, Bloomington,
1979. Available from University Microfilms.]
(1980). Child phonology and phonological theory. In G. Yeni-Komshian et al. (eds.)
(1980). 23–42.
(1983). Development of articulatory, phonetic, and phonological capabilities. In B.
Butterworth (ed.) Language Production 2. London: Academic Press. 3–50.
Menn, L. and C. Stoel-Gammon (1995). Phonological development. In P. Fletcher
and B. MacWhinney (eds.) A Handbook of Child Language. Oxford: Blackwell.
335–359.
Menn, L. and E. Matthei (1992). The ‘two-lexicon’ model of child phonology: looking
back, looking ahead. In C. A. Ferguson et al. (eds.) (1992). 211–247.
Menyuk, P., L. Menn, and R. Silber (1986). Early strategies for the perception and
production of words and sounds. In P. Fletcher and M. Garman (eds.) Language
Acquisition 2. Cambridge: Cambridge University Press. 198–222.
Moskowitz, A. (1970). The two-year-old stage in the acquisition of phonology. Lg 46.
426–441.
Ohala, J. J. and H. Kawasaki-Fukumori (1997). Alternatives to the sonority hierarchy
for explaining the shape of morphemes. In S. Eliasson and E. Håkon Jahr (eds.)
Studies for Einar Haugen. Berlin: Mouton de Gruyter. 343–365.
Oller, D. K. and M. L. Steffens (1994). Syllables and segments in infant vocalizations
and young child speech. In M. Yavas (ed.) First and Second Language Phonology.
San Diego: Singular Publishing Group. 45–62.
Pater, J. (1997). Minimal violation and phonological development. Language Acquisition
6. 201–253.
(2000). OT acquisition bibliography. Archived on Linguist List, accessible via
CHILDES (Child Language Data Exchange).
http://listserv.linguistlist.org/cgi-bin/wa?A2=ind0006B&L=info-childes&P=R74
Peters, A. M. (1977). Language learning strategies. Lg 53. 560–573.
(1983). The Units of Language Acquisition. Cambridge: Cambridge University Press.
(1985). Language segmentation: operating principles for the analysis and perception
of language. In D. I. Slobin (ed.) The Crosslinguistic Study of Language Acquisition
2. Hillsdale, N.J.: Erlbaum. 1029–1067.
Plaut, D. C. and C. T. Kello (1999). The emergence of phonology from the interplay
of speech comprehension and production: a distributed connectionist approach.
In B. MacWhinney (ed.) The Emergence of Language. Hillsdale, N.J.: Erlbaum.
381–416.
Priestly, T. M. S. (1977). One idiosyncratic strategy in the acquisition of phonology.
Journal of Child Language 4. 45–66.
Prince, A. and P. Smolensky (1993). Optimality Theory: constraint interactions in gen-
erative grammar. MS., New Brunswick, N.J.: Rutgers University; Boulder, Col.:
University of Colorado. To appear as Linguistic Inquiry Monograph, MIT Press,
Cambridge, Mass.
Rice, K. and P. Avery (1995). Variability in a deterministic model of language acquisition:
a theory of segmental elaboration. In J. Archibald (ed.) Phonological Acquisition
and Phonological Theory. Hillsdale, N.J.: Erlbaum. 23–42.
72 Lise Menn
Amalia Gnanadesikan
1. Introduction
This chapter argues that constraint-based Optimality Theory (Prince and
Smolensky 1993) provides a framework which allows for the development
of a unified model of child and adult phonology and the relation between the
two. In Optimality Theory (OT) an adult phonology consists of a set of ranked
constraints. The ranking, but not the constraints, differs from language to lan-
guage. The constraints are universal. If phonological constraints are universal,
they should be innate.1 I claim that these innate constraints are operative in
child phonology. The constraints used in adult language should therefore be
adequate to account for child phonology data as well, without attributing to the
child, as in previous theories, more representational levels or more rules than
adults have. This chapter shows that this is indeed the case.
The initial state of the phonology, I propose, is one in which constraints
against phonological markedness outrank the faithfulness constraints, which
demand that the surface form (output) is identical to the underlying form (input,
in OT terminology).2 The result is that in the initial stages of acquisition the out-
puts are unmarked. The process of acquisition is one of promoting the faithful-
ness constraints to approximate more and more closely the adult grammar, and
produce more and more marked forms.3 The path of acquisition will vary from
child to child, as different children promote the various faithfulness constraints
in different orders.
In adult languages certain faithfulness constraints outrank certain markedness
constraints. This is required by the need to have enough contrasts to support a
large lexicon. By interspersing the markedness and faithfulness constraints in a
ranked hierarchy, every adult language balances markedness and unmarkedness.
Each language will differ from others in its ranking of particular constraints,
and so each language will express markedness and unmarkedness in differ-
ent ways. The child, who begins with dominant markedness constraints – and
hence unmarked outputs – has the task of achieving the particular ranking of
unmarkedness and faithfulness found in her target language.
73
74 Amalia Gnanadesikan
there is underlyingly more than one onset consonant. The sonority constraints
are too low ranked to have an effect on marked underlyingly single onsets. The
Obligatory Contour Principle (OCP) also emerges to influence the selection of
the onset consonant in some cases, but it does not rank highly enough to do so
in all cases.
The onset consonants I discuss in this chapter are specifically word-initial or
stressed-syllable-initial. Section 3 introduces G’s reduction of onset clusters,
showing how *Complex outranks faithfulness. Constraints on sonority select
the least marked onset. Section 3 assumes that G has segmentally accurate (that
is, adult-like) inputs. Section 4 demonstrates, based on the evidence of a dummy
syllable, that G’s inputs are indeed segmentally accurate. It discusses cases of
onset selection in the presence of the dummy syllable, where the potential
onset consonants are not underlyingly in a single cluster. Section 5 considers
the behaviour of labial clusters where the output onset consonant is not the
same as any of the input consonants. The labial cluster data are used to show
that G’s preferred method of satisfying *Complex is through coalescence.
This section puts together the effects of sonority, *Complex, faithfulness
constraints on features, and faithfulness constraints on segments. Section 6
looks at the exceptions to the pattern of labial coalescence in section 5 and shows
that these supposed exceptions provide a clear example of the Emergence of the
Unmarked in the interaction between the OCP and the faithfulness constraints.
The OCP is dominated and thus disabled in certain cases, but emerges to play
a crucial role in others.
Before beginning the analysis of G’s onsets, I present in section 2 a brief
outline of Optimality Theory and the view of faithfulness presupposed in the
rest of this chapter.
candidate2 *
76 Amalia Gnanadesikan
For a hypothetical input /skat/ and candidate output ska, the output incurs
one violation of Max-IO, but no Ident[F] violations. The candidate output
skaʔ does not incur a Max violation if the final glottal stop is derived from the
input t. It does, however, violate Ident[Coronal], as the [Coronal] feature of
the input t is not present on its correspondent ʔ. The correspondence between
segments which are not featurally identical can be made explicit by coindexing,
as in input s1 k2 a3 t4 and output s1 k2 a3 ʔ4 .
(3)
*Complex Faith
please:
pliz → pliz *!
pliz → piz *
peas:
piz → piz
piz → iz *!
The situation is more complex than tableau (3) indicates, however. Since
*Complex requires simply that the onset contain maximally one segment,
either the p or the l could have been deleted in ‘please’. Although the first
consonant is the surviving one in the examples in (2a), the examples in (4)
show that this is not always the case.
In the examples in (4a), the initial s is lost. This is not true of all initial ss,
however, as (4b) shows. When the s is less sonorous than the following con-
sonant (nasal or liquid) it is retained in the output. If the s is more sonorous
than the following consonant (plosive), it is deleted. Similarly, it is the more
sonorous consonant in (2a) that is deleted. Thus it is only the least sonorous of
a string of onset consonants that is present in G’s output. This means that G
produces syllables that optimise syllable shape not only with respect to segment
number (restricting to one onset consonant), but also with respect to sonority
requirements. The optimal syllable begins with an onset of low sonority and is
followed by a vowel.
G’s reduction of multiple onset consonants to a single onset consonant in
the mapping between input and output is analogous to the reduction of onset
clusters in the mapping between base and reduplicant in Sanskrit. In Sanskrit
reduplication a single-onset syllable is prefixed to the verb stem. The segments
of the prefixal syllable are determined by the stem. With minor changes such as
deaspiration, the least sonorous onset consonant of the stem begins the prefix
and is followed by the vowel. Examples of perfect reduplication are shown
in (5a), while (5b), (5c), and (5d) show forms in the aorist, intensive, and
desiderative, respectively (from Whitney 1889, who does not supply glosses).
Markedness and faithfulness 79
In the examples in the left column the first consonant is least sonorous and thus
appears in the reduplicant. In the right column it is the second consonant that
is least sonorous and shows up in the reduplicant.
(5) (a) perfect
pa - prach ta - sth a
ši - šri cu - šcut
sa - sna pu - sph ut.
su - sru
ši - šlis.
(b) aorist
a - ti - trasam a - pi - spršam
a - si - s.yadam
(c) intensive
ša - švas kani - s.kand ∼ cani - s.kand
ve - vli
po - pruth
(d) desiderative
šu - šru s.a ti - sti rs.a
As the examples in (5) show, the onset reduction occurs in all cases of redu-
plication in Sanskrit. Onset reduction is thus not a special rule of some mood
or tense but rather a general property of Sanskrit reduplicants. In Correspon-
dence Theory, the reduplicant is related to the base by a mapping analogous
to the mapping between the output and the input. Sanskrit reduplicants obey
*Complex at the cost of violating faithfulness to the base, while in G’s lan-
guage the outputs obey *Complex at the cost of violating faithfulness to the
input.
The difference between Sanskrit reduplication and G’s treatment of onset
clusters in (2) and (4) is that in the Sanskrit case the optimised syllable of the
prefixed reduplicant stands in relation to the more complex verb stem base,
while in G’s speech the output unmarked syllable stands in relation to the more
complex input form. G’s grammar ranks *Complex above Faith-IO, the
constraint requiring faithfulness of the output segments to the input segments.
This was shown in tableau (3). In Sanskrit Faith-IO outranks *Complex,
since underlying /sth a / surfaces as sth a and not *th a or *sa . Faith-BR, the
constraint requiring faithfulness of the reduplicant to the base, is dominated
by *Complex, with the result that the reduplicant may not have the base’s
complex onset.6
Sanskrit reduplication is a typical case of the Emergence of the Unmarked
discussed in McCarthy and Prince (1994) (compare Steriade’s 1988 discussion
of unmarkedness in Sanskrit reduplicants). In non-reduplicative environments
80 Amalia Gnanadesikan
The constraint rankings in (6) account for the reduction of complex onsets
to single segments in G’s words and in Sanskrit reduplicants. What is not
yet accounted for is the choice of which segment survives in the output or
reduplicant. In selecting the onset, G and Sanskrit display unmarkedness. The
least marked syllable onset is the least sonorous one. An optimal onset consists
of a voiceless stop. A voiced stop is the next best, and so on down the universal
Sonority Hierarchy in (7).7
Markedness and faithfulness 81
(8) /a » /e,o » /i,u » /r,l » /m,n » /v,z » /f,s » /b,d » /p,t
is assured by the constraint onset, which demands that every syllable have
an onset.11 This is demonstrated in tableau (9) for examples from G’s speech
(only the relevant part of the /Y subhierarchy is shown). The effect of *Com-
plex (shown above in tableau (3)) is assumed, so only candidates without
complex onsets are considered. All the candidates shown violate Faith-IO
in addition to the violations shown in (9), but the ranking of Faith-IO with
respect to the sub-hierarchy of constraints in (9) cannot be determined on the
basis of the data presented thus far. The crucial point is that given the undom-
inated ranking of *Complex, Faith-IO must be violated in complex onset
cases, and the sonority constraints take advantage of this forced violation to
minimise /Y sonority violations.
(9)
/b,
Onset /r,l /m,n /v,z /s,f d,g
sky:
sgay → say *!
sgay → gay *
sgay → ay *!
snow:
sno → so *
sno → no *!
sno → o *!
sleep:
slip → sip *
slip → lip *!
slip → ip *!
draw:
drɔ → dɔ *
drɔ → rɔ 12 *!
drɔ → ɔ *!
Markedness and faithfulness 83
This section has demonstrated the power of OT to capture both child and
adult phonology. OT provides a set of universal constraints which are operative
in both children and adults. Comparison of G’s phonology with that of Sanskrit
shows that G is using the same constraints as those present in adult language.
This was shown by the similar action of *Complex and the sonority constraints
in Sanskrit and in G’s language.
Comparison of English and G’s language demonstrates the innateness of the
phonological constraints. G uses the constraint *Complex in spite of the fact
that English, her target language, provides her with no evidence for its existence.
English contains a high percentage of words with complex onsets, so that G
would not postulate *Complex as a property of her target language. G does
use *Complex, however, and ranks it highly enough that it is never violated
in her language. By the ranking of *Complex over Faith-IO (as shown in
tableau (3)) English ‘straw’, ‘please’, and ‘friend’ become dɔ, piz, f εn.
In Sanskrit *Complex is ranked less highly. It is dominated by Faith-IO
but it outranks Faith-BR. The result is that the effects of *Complex emerge
in reduplicants but not in bases, yielding reduplicant-base forms such as ta-
sth a. Sanskrit, G, and English provide a paradigm of rankings of the universal
constraint *Complex with respect to faithfulness constraints. In G’s language
*Complex is undominated and hence unviolated. In Sanskrit *Complex is
dominated but still active where the dominating constraint (here Faith-IO) is
irrelevant, demonstrating Emergence of the Unmarked. In English *Complex
is dominated and never active.13
G’s ranking of *Complex over Faith-IO in the face of the opposite rank-
ing in English is in keeping with the proposal made in section 1 that initially
markedness constraints outrank faithfulness constraints. To acquire the English
ranking she will have to promote Faith-IO above *Complex. The sonor-
ity constraints are at this point already dominated by faithfulness constraints,
since very sonorous consonants can be onsets. When there is a choice of onset
segments, though, the effects of sonority emerge, selecting s in so ‘snow’, but
d in dɔ ‘draw’ (as shown in tableau (9)).
segments of the output form a contiguous string.17 The mapping from input to
output forms is shown in (11), where ‘koala’ is compared with ‘Rebecca’. Bold-
face lines show segmental correspondence between input and output. Note that
these are not association lines, but simply the equivalent of coindexing between
input and output corresponding segments. Dotted lines show correspondence
between input and output syllables. Light lines show the association of segments
with syllables.
(11) (a) σ σ σ (b) σ σ σ
ko ɑ l ɑ r ə bε k ə
f i k ɑ l ɑ f i b ε k ə
σ σ σ σ σ σ
In (11) both outputs violate Faith, but (11a) also violates I-Contig, since
the segments of the input which participate in the correspondence from input
to output do not form a contiguous string. The transfer of a better onset to the
second syllable in defiance of the contiguity constraint occurs in cases where
the adult syllable is onsetless or begins with a glide or liquid. It does not occur
if the adult syllable begins with a nasal or an obstruent. Compare (12a) with
(12b).
(12) (a) balloon [fi-bun] (b) tomorrow [fi-mɑ wo]
police [fi-pis] potato [fi-teɾ o]
below [fi-bo] Simone [fi-mon]
barrette [fi-bε t]
koala [fi-kɑ lɑ ]
Onset replacement occurs in cases where the input stressed syllable has a
glide or liquid onset, but not if that onset is any less sonorous.18 Unlike in the
onset reduction discussed in section 3 the fi- related onset replacement does
not occur in all cases where the unstressed syllable has a less sonorous onset
than the stressed syllable. This is demonstrated in forms such as ‘Simone’. As
section 3 showed, contiguous s-nasal clusters lose the nasal in favour of the
fricative s. As a result ‘snow’ is pronounced so. In the non-contiguous fi- cases
a nasal remains and is not replaced by a fricative. ‘Simone’ is fi-mon not *fi-
son. The data in (12) thus show that although sonority constraints are at work in
both the contiguous cluster cases of section 3 and the non-contiguous fi- cases,
the faithfulness constraint which interacts with the sonority constraints must
be different in the two types. I-Contig is therefore separate, and separately
ranked from the Faith of section 3.19 Violation of I-Contig is forced by
/glide (12a), but not by the lower-ranked /Y constraints (12b). /liquid may
86 Amalia Gnanadesikan
also dominate I-Contig. It is difficult to tell, since all onset rs are pronounced
as ws and many (but not all) onset ls are ys. In what follows, /glide should be
considered to include /liquid. Faith-IO, on the other hand, is dominated by
*Complex and unrankable with respect to the /Y sub-hierarchy.
Tableau (13) illustrates the action of sonority requirements and I-Contig in
the presence of fi-insertion. For the purposes of this tableau, all candidates are
assumed to have fi-, and the mechanism responsible for the fi- is not included
here.
(13)
balloon:
fi-bun * *
fi-yun *!
Rebecca:
fi-b ε kə * *
fi-w ε kə *!
koala:
fi-kɑ lɑ * *
fi-ɑ lɑ *!
fi-wɑ lɑ *!
In spite of the fact that G’s use of fi- leads to significant divergence from
the segments of the adult unstressed syllable, the interaction between fi- and
the sonority requirements on the following onset shows that G’s inputs coin-
cide segmentally with the adult forms. When required by Onset or /glide,
the onset of the syllable replaced by fi- shows up on the next syllable, demon-
strating that the replaced segments are still present and available for use by
the phonology. Furthermore, the presence of the dummy syllable fi- shows
that her inputs are prosodically accurate even when her outputs are segmen-
tally divergent. Although the segmental content of the fi- syllable is idiosyn-
cratic, the constraints with which it interacts are not. The constraints that
select G’s output, namely the /Y constraints and I-Contig, are precisely
those which have been independently proposed as universally existing in adult
grammar.
Just as in the case of Sanskrit, G’s use of fi- is analogous to a phenomenon
found in adult natural languages. More specifically, it resembles melodic
Markedness and faithfulness 87
overwriting such as that commonly seen in echo word formation (McCarthy and
Prince 1986, 1990, 1995b, Steriade 1988, Yip 1992). Echo words are formed
by reduplication with a fixed melody replacing part of the base melody in the
reduplicative morpheme. This is shown in (14) for echo words in Kolami (from
McCarthy and Prince 1986, citing Emeneau 1955).
In Kolami, the melodic overwriting replaces the first consonant (if any) and the
first vowel (long or short) of the base. Coda consonants after the first vowel
remain faithful to the base. In G’s language she overwrites a whole syllable
whether open or closed. Thus both the chris in ‘Christina’ and the re of ‘Rebecca’
are replaced by fi-. The difference between adult language overwriting and G’s
overwriting is that in G’s overwriting a whole syllable is replaced, while in
adult overwriting only segments are replaced. Kolami therefore has pal-gil,
replacing the first two segments, but leaving the coda l of the base to show
up after the overwriting gi. G, on the other hand, has forms such as fi-tenə
‘container’, which replaces the con syllable with the codaless fi-, rather than
*fi-ntenə, which leaves the coda. If G’s inputs possess prosodic structure and if
adult inputs do not (see again note 16), this difference is not surprising. In G’s
language the initial unstressed syllable is required to stand in correspondence to
a syllable whose segments are fi-. If adults do not have syllables in their inputs
they cannot overwrite whole syllables (as G does with fi-), but only segments.20
In each case in (15) the last consonant in the adult onset is labial (in the case
of w) or labialised (in the case of r).22 G’s output consonant is also labial.
(See Allerton (1976), and Chin and Dinnsen (1992) for similar patterns of clus-
ter simplification where [labial] dominates.) In voicing, stricture, and nasality,
however, it matches the least sonorous onset consonant, the one which the facts
in section 3 would lead us to expect in the output. The consonants in (15) coa-
lesce, with the sonority of one consonant showing up with the place of another.
The correspondence between input and output for ‘tree’ is shown in (16). The
indices show that both the t and the rw in the input are mapped to the p of the
output. The affected features are shown for each segment.23
t1 rw2 Input
p1, 2 Output
-sonorant
-continuant labial
In pi ‘tree’, the p corresponds to both the /t/ and the /rw / of the input, since
it bears features of each.24 By coalescing the segments G can avoid deleting
either one of the segments while still obeying the high-ranking *Complex.
Such a view of coalescence as a strategy to preserve the input segments is in
contrast with a spreading and deletion model of coalescence (such as Chin and
Dinnsen 1992) where one of the consonants in the cluster is deleted. Since one
of the segments is lost anyway, the motivation for the spreading of features is
arbitrary.
In OT coalescence makes sense as a way of remaining faithful to the input
segments even in the face of *Complex. Coalescence is not a cost-free proce-
dure, however. In coalescing these segments G violates No-coalescence
(No-coal), the faithfulness constraint that requires an output segment to cor-
respond to only one input segment.25 The violation of No-coal in reducing
onset clusters demonstrates that No-coal is lower ranked than *Complex.
There is good evidence that these labial cases do represent a coalescence
and not an error in perception. That is, the coalescence occurs in the mapping
between a segmentally accurate input and G’s output. It is not the case that she
hears the consonant cluster as a single labial consonant and therefore has a single
labial in her input onset. The evidence for grammatical coalescence comes from
forms where labial coalescence occurs in the non-contiguous consonants of
fi- words. This is shown in (17).
Markedness and faithfulness 89
(17) 26
gorilla [fi-biyə] σ σ
giraffe [fi-bæf ]
direction [fi-bεkšən] d
ə rw f
f i b f
σ σ
In (17) the input onset of the stressed syllable is labial, but it is a glide in G’s
language. The facts in the previous section suggest that it should be replaced
by the onset of the first syllable. As in the contiguous cluster examples in
(15), coalescence occurs instead of outright replacement. The onset of the first
syllable and the onset of the second syllable coalesce, yielding an output onset
with the sonority of the first input onset and the labial place of the second.
In these cases of long-distance coalescence, G could not perceive the two con-
sonants as one, as the adults pronounce them in separate syllables. G clearly
recognises the two syllables as distinct, since she replaces one of them with
fi-. Thus it is not the case that the distinction between the two consonants is
not being perceived. While words like pi (‘tree’) could conceivably stem from
misperception or some other mechanism that produces apparent coalescence
ready-made at the input, the long-distance coalescence facts demand a more ac-
curate input, suggesting that words like pi (‘tree’) have accurate inputs too.27 The
coalescence in labial clusters is best analysed as a grammatical process, which
occurs in the mapping between input and output, rather than a perceptual error
or other process that would provide G with inputs that were already coalesced.
The labial cases provide evidence that the preferred way of avoiding a
*Complex violation is through coalescence (violating No-coal) rather than
outright deletion (violating Max-IO). If deletion were the optimal way of
avoiding *Complex violations, labial glides in input clusters would simply
disappear in the output. This implies that for G it is better to coalesce segments,
violating No-coal, than it is to delete them and violate Max. In terms of
G’s constraint ranking, Max dominates No-coal. By this ranking ‘tree’ is
pi, not *ti, as shown in tableau (18) (as elsewhere, the effects of undominated
*Complex are assumed).
(18)
Max No-coal
tree:
t1 rw 2 i3 → p12 i3 *
t1 rw 2 i3 → t1 i3 *!
90 Amalia Gnanadesikan
Note that Max is not violated in the winning candidate since each input segment
has an output correspondent, as indicated by the indices (see McCarthy and
Prince 1995a, 1999).
The ranking of Max over No-coal would suggest that coalescence is
always better than deletion and so this is always the option that G pursues. In
a case such as kin ‘clean’, however, the l apparently has no correspondent
in the output. It appears as though deletion has taken place, not coalescence. In
other words, it seems that No-coal is not violated but Max is. The answer to
this apparent paradox is in the definition of correspondence of segments. In the
example below in (19), Gen, which freely generates candidates for evaluation
by the constraint hierarchy, generates both the form in (19a) and that in (19b)
as potential outputs of ‘clean’.
(19) (a) ‘clean’ k1 l2 i3 n4 (I) (b) k1 l2 i3 n4 (I)
k1 i3 n4 (O) k12 i3 n4 (O)
In (19a) Gen deletes the /l/ of the input, incurring a Max violation.28 In (19b)
Gen maps both the input /k/ and the /l/ onto the output k. The output k is thus
the correspondent of both the underlying /k/ and the /l/, as emphasised by the
indices. No segment has been deleted, since all input segments have an output
correspondent. Therefore the candidate in (19b) violates No-coal but not
Max. The output (19b) also violates a number of Ident[F] constraints, the
class of constraints discussed in section 2 which control featural faithfulness.
In fact none of the l’s features shows up on its output correspondent, and the
coalescence is vacuous.
Such flagrant violation of the Ident[F] constraints is driven mainly by the
sonority constraints mentioned earlier. The /Y sub-hierarchy will pick out
the manner features of the least sonorous input segment to show up on the
coalesced output segment. This leaves the place features as the only way fea-
tures of the more sonorous input consonant could show up on the coalesced
output consonant. In the case of the labial or labialised consonants, the labial
surfaces, as in the examples in (15) above. As forms such as bin ‘green’ tes-
tify, Ident[labial] is higher ranked than Ident[dorsal].29 This is shown in
tableau (20).
(20)
Ident[labial] Ident[dorsal]
green:
g1 rw 2 i3 n4 → b12 i3 n4 *
g1 rw 2 i3 n4 → g12 i3 n4 *!
(21)
Ident[labial] Ident[coronal]
tree:
t1 rw 2 i3 → p12 i3 *
t1 rw 2 i3 → t12 i3 *!
The rankings in (20) and (21) assure that [labial] will show up when it
is part of a coalesced sequence. The relative ranking of Ident[dorsal] and
Ident[coronal] is demonstrated by forms such as kin ‘clean’. Ident[dorsal]
is higher ranked than Ident[coronal], since the [dorsal] of /k/ rather than the
[coronal] of /l/ surfaces. Tableau (22) illustrates this ranking.
(22)
Ident[dorsal] Ident[coronal]
clean:
k1 l2 i3 n4 → k12 i3 n4 *
k1 l2 i3 n4 → t12 i3 n4 *!
(27)
Ident[labial] OCP
blue:
b1 l2 u3 → b12 u3 *
b1 l2 u3 → d12 u3 *!
room:
w1 u2 m3 → w1 u2 m3 *
w1 u2 m3 → y1 u2 m3 *!
that is deleted. If there is a segment that can safely be deleted, the high-ranking
Ident[labial] can be bypassed. If the entire labial segment is deleted, the OCP
is also satisfied. Deletion is therefore the best option in this case because it
satisfies the OCP and Ident[labial], although it violates Max instead. This
demonstrates that the OCP dominates Max, as shown in tableau (28) (which
considers only forms which do not violate Ident[labial], based on the ranking
in (27)).
(28)
OCP Max
grow:
1 rw 2 o3 → 1 o3 *
1 rw 2 o3 → b12 o3 *!
If labials can delete to avoid violating the OCP, one would expect the b
to delete in bu ‘blue’, yielding *lu (or *yu). It turns out that true consonants
may not delete, although glides (and liquids) can.35 Compare the forms in (25)
and (26), where /rw / deletes, with that in (29a), where the labial consonant
coalesces.36
Tableau (29b) shows that, given the current constraint ranking, the incorrect
candidate *lu (with deletion of the input labial /b/) is the winner. The con-
straint Max in tableau (28) must therefore be parameterised to include glides
(and liquids) but not true consonants. I shall call this constraint Max(GL). It
demands that all glides and liquids present in the input must be present in the
output. The forms in (29) obey the constraint Max(cons), the analogous con-
straint for true consonants. Max(cons) is ranked above the OCP, since true
consonants cannot be deleted under pressure from the OCP. This is shown in
tableau (30).
96 Amalia Gnanadesikan
(30)
b in o b u
The non-coalescing cases such as o ‘grow’ provide yet more evidence for
segmentally accurate inputs. Although ‘green’ and ‘bean’ are homophonous
for G – both are bin – r and b must be separate at underlying representation,
since the OCP distinguishes between them. If the following vowel is rounded,
the OCP forces a labial glide to be deleted instead of coalescing. Thus ‘grow’ is
o, not *bo. Due to the high-ranking of Ident[labial], however, an underlying
b surfaces, so ‘boat’ is bot. G must therefore distinguish between underlying
r and b. It cannot be maintained that the difference between ‘green’ (which
coalesces) and ‘grow’ (which deletes rw ) lies in perception and that the labial
98 Amalia Gnanadesikan
consonant). A default rule would presumably supply segments with the default
place [coronal] where necessary. The coalescence of labial onsets would also be
the result of a rule. No matter which order [labial] dissimilation and coalescence
are applied in, the wrong result obtains. The two orders of application are shown
in (34).
(34) (a) green grow room small
Input rw in rw o rw um smɔ l
Delete Labial rw in o yum sɔ l
(+ default [cor])
Coalesce bin o yum sɔ l
Output bin o *yum *sɔ l
OT, on the other hand, succeeds in describing the complex behaviour of the
OCP through the interaction of constraints which are in themselves simple and
independently motivated. It captures the emergence of the unmarked in the
action of the OCP because it represents the OCP not as the motivation for an
idiosyncratic ordered rule or as a surface-true constraint but as one constraint
among many which will be violated when it is dominated by the appropriate
constraint. The dominating constraints ensure that the OCP in this case emerges
only in a very restricted environment. The OCP effects in G’s language show
that child phonology, like adult phonology, is best described by a hierarchy of
ranked, violable constraints as provided by Optimality Theory.
8. Conclusion
This chapter has demonstrated that Optimality Theory succeeds in describing
child phonology by ascribing to the child’s grammar the same constraints and
ranking properties as are required for adult languages. In particular, section 3
showed how G’s treatment of onsets is driven by the same constraints as onset
simplification in Sanskrit reduplication, while section 5 showed that G’s pattern
of coalescence is like that in Navajo.
An OT model of child phonology neatly accounts for the fact that the child
frequently derives outputs which diverge quite strongly from the adult forms.
The divergence stems from a difference in constraint rankings between the child
language and the target language. It need not lie in a difference in input forms, as
this child’s inputs have been shown to be segmentally accurate. The evidence
for this came from the facts of onset replacement in fi- words, described in
section 4.
The fact that the child language displays less markedness than the adult
language is due to the initial state of the grammar, in which the marked-
ness constraints are ranked above the faithfulness constraints, as proposed in
section 1. As the phonology develops, the faithfulness constraints move upward
as required to approximate the adult language. At the stage of the data in this
chapter, G still ranked the markedness constraints *Complex, /glide, and
the OCP above the faithfulness constraints Max, No-coal and I-Contig.
*Complex was at this point still undominated by any faithfulness constraints.
As children begin to promote the faithfulness constraints, their phonology is
predicted to display cases of the Emergence of the Unmarked. This was shown
to be the case for G’s ranking of the OCP in section 6. In some cases the OCP
played a crucial role in selecting the output, while in others it was rendered
powerless by the demands of the constraints which dominate it. G’s treatment
of labial clusters could not be captured using a system of unranked constraints,
while a rule-based derivation was arbitrary in its connection to the OCP. Only
Markedness and faithfulness 101
OT, which imposes a ranking on a set of violable constraints, could account for
the OCP’s behaviour in G’s language.
This chapter has shown that the application of OT to acquisition allows both
child and adult language to be analysed using the same model of phonology,
and using the same constraints. This unity of approach has in general not been
achieved in other models of child phonology. For comparison, consider three
alternative models of child phonology. Smith (1973) considers the child’s sys-
tem of phonology a mapping from an adult-like underlying form to the child’s
surface form. Working in a rule-based framework inspired by SPE (Chomsky
and Halle, 1968), he ascribes to the child a long series of ‘realisation rules’ de-
riving the child’s output forms from the adult-like underlying forms. By using
such rules Smith attempts to describe child language in the model used for adult
language. Using this model has the unsatisfactory result that the child has more
phonological rules than the adults do. The child is seen as having formulated
a large number of rules for which he has never received any evidence. Also,
as Smith (1973: 176 ff.) and Ingram (1976, 1989a) point out, many of these
rules have the same purpose. For example, 7 of 23 rules have the function of
eliminating consonant clusters.
Ingram (1976, 1989a, 1989b) avoids the proliferation of formally unrelated
rules by ascribing to the child an extra phonological level. In his model the
three levels of child phonology are perception, organisation, and production.
Constraints may operate at each level. Perceptual constraints, such as the in-
ability to perceive coda consonants, may be translated into organisational rules
or production rules after the child becomes able to perceive codas. For instance,
if a child is at first unable to perceive codas she may hypothesise that all words
consist of CV syllables. She will then map new words onto this CV pattern
at the organisation level even after she has begun to perceive codas. Once the
child can process and produce coda consonants at all phonological levels, the
rule is suppressed.
Stampe (1973) uses a two-level model like Smith, with adult-like inputs,
but the outputs are derived by a set of innate natural processes which serve to
reduce markedness. In the earliest stage of acquisition the processes operate
freely whenever their conditions are met. As the phonology develops, these
processes are ordered (and re-ordered), limited in application, or suppressed
as needed to approximate more and more closely the adult grammar. Thus a
child with only CV syllables would derive them from adult CVC syllables by
a natural process deleting coda consonants. If the child’s target language has
codas the natural process which deletes them is eventually suppressed.
Ingram’s model is one in which constraints or rules operate on both the input
(organisation) and the output (production). Smith’s is one in which a long series
of ordered rules operate directly on the input to derive the output. Stampe’s
model provides universal rules, but still relies on serial derivation of the output
102 Amalia Gnanadesikan
from the input. All three give the child more tasks and hence a larger processing
load than the adult has. Smith gives the child more rules, Ingram gives her an
extra level, and Stampe and Ingram both ascribe to the child processes that
are later suppressed. Stampe’s model is closest to that provided by OT in that
it proposes innate processes. The difference is that it relies on ordering, not
ranking, as OT does. Such ordered derivations will fail at a natural description
of the Emergence of the Unmarked, as noted in section 6.
By contrast, Optimality Theory does not give the child an extra processing
load. The child does not have more rules or more levels than the adult. In the
OT framework the phonology is one of output constraints which are universal,
and hence shared by children and adults. The output candidates evaluated by
children and adults are the same, but children select different winning candidates
than do adults. Re-ranking of the constraints provides the means by which the
phonology develops. Certain rankings of the constraints will cause the child’s
language to exhibit Emergence of the Unmarked. Such cases provide strong
evidence that child phonology is best modelled by a hierarchy of constraints as
in OT.
notes
* This is a revised version of a 1995 University of Massachusetts, Amherst paper. I
should like to thank Gitanjali for cheerfully providing the data for this paper and
occasionally allowing me time to work on it. I am also grateful to John McCarthy,
Linda Lombardi, Joe Pater, Lisa Selkirk, and Shelley Velleman (and three anonymous
referees for this volume) for much needed suggestions and criticisms. Remaining
errors are my own. This work was supported in part by NSF grant SBR–9420424.
1. [Editors’ note] For different views, see Boersma (1998) and Hayes (1999).
2. A similar proposal, namely that constraints requiring unmarked prosodic forms ini-
tially outrank constraints requiring faithfulness to input segments, is made by Demuth
(1995).
3. [Editors’ note] Acquisition as the demotion of markedness constraints rather than the
promotion of faithfulness constraints is discussed in detail by Tesar and Smolensky
(1998, 2000).
4. Like other children (Macken 1980) G uses a few forms which can only be explained
by assuming ‘incorrect’ inputs, i.e., inputs with different segments than the adults
have. Presumably these forms originate in misperceptions. Such words are very few
in G’s speech, however. The vast majority of G’s words can be systematically derived
with the assumption of ‘correct’ inputs.
5. [Editors’ note] See Goad and Rose’s contribution to this volume, and references cited
there, for more elaborate remarks on the behaviour of initial clusters.
6. In Sanskrit Faith-IO and Faith-BR are, respectively, the Max-IO and Max-
BR of McCarthy and Prince (1995a). I use the more general Faith to facilitate
comparison with G’s language.
7. For the use of sonority to optimise syllable shapes in OT, see Prince and Smolensky
(1993). For other references establishing the sonority hierarchy, see Sievers (1881),
Jespersen (1904), de Saussure (1916), Zwicky (1972), Hankamer and Aissen (1974),
Hooper (1976), Steriade (1982), and Selkirk (1984) among others.
Markedness and faithfulness 103
8. Since G produces s+stop clusters as a simple voiced stop, I assume that she
has assigned the neutralised voiceless unaspirate of this position to the voiced
(unaspirated) phoneme. This is not uncommon, although children vary as to
whether they treat these unaspirated stops as voiced or voiceless (Bond and Wilson
1980).
9. The constraint family in (8) is closely related to Prince and Smolensky’s (1993)
*Margin hierarchy which prohibits segments from being parsed as syllable mar-
gins (forcing them to be parsed as nuclei). The difference between the *Margin
hierarchy and the /Y hierarchy is that by the use of moraic versus nonmoraic po-
sitions (instead of nuclear versus margin positions) the /Y hierarchy accounts for
the fact (discussed by Zec 1988) that codas are of high sonority relative to onsets.
10. Given the universal nature of the Sonority Hierarchy, the ranking of the constraints
in (8) should also be universal.
11. Recall that G’s speech has vowel-initial words (‘I’ [ay]), which suggests that segmen-
tal faithfulness dominates Onset. Not resolved here is what prohibits the whole
/Y hierarchy from outranking Onset and thus producing a language with no
onsets at all. The same open question exists in Prince and Smolensky’s (1993) pre-
sentation of syllabification, since theoretically their whole *Margin/ hierarchy
(which prohibits segments from being parsed as margins) could outrank Onset,
yielding a language with only nuclear segments.
12. Recall that input /r/ would be realised as [w] in onsets in G’s speech, cf. (2b).
13. [Editors’ note] A referee notes that *Complex appears to be active in English
hypocoristics (cf. Sandy from Sandra).
14. G’s use of fi- is very similar to A’s (Smith 1973) use of the syllable ri- in practically
the same environment (A’s was initial unstressed syllables whose vowel was i or
ə. It is hard to tell whether the vowel quality is important or not, since virtually
all unstressed vowels can be reduced to ə). Like A, G shows some variation, so
that G’s ‘piano’ has been pronounced as both pinæno and fi-næno. In G’s case it is
my impression that the occasional non-use of fi- is most likely to occur in words
which begin with labial consonants. I assume this is because G can pronounce an
underlying labial consonant with as little violence as possible to the set fi- form.
It is worth speculating on why these two children, but apparently no others in the
acquisition literature, used this type of pretonic dummy syllable. My hunch is that,
while the strategy may be quite rare, it is probably more common than the literature
suggests. G used fi- for almost a year, but it might be that a child with nonlinguist
parents would experience such a lack of parental comprehension in the face of a
dummy syllable that she would be highly motivated to switch to almost any other
grammar that she was developmentally ready for. Dummy syllables might therefore
occur in a number of children, but usually be rapidly suppressed.
15. At an earlier stage the initial unstressed syllables were deleted, as is typical in child
language (Kehoe and Stoel-Gammon 1997, Pater 1997, and references cited there).
The dummy syllable stage represents an intermediate stage where faithfulness to
the syllable has been promoted, but faithfulness to the segments has not.
16. [Editors’ note] The issue of syllable structure in adult lexical representations has
not been settled; see Inkelas (1995) and references cited there.
17. See McCarthy and Prince (1995a) for the motivation of this constraint in adult
languages, the evidence for which includes the preservation in languages such as
Diyari (Austin 1981) of word-internal codas but not word-final ones in the face of
a prohibition on syllable codas.
104 Amalia Gnanadesikan
18. A potential obstacle to this interpretation is that h is left in words such as fi-hayn
‘behind’, and h is arguably more sonorous than liquids (Gnanadesikan, 1995, 1997)
since it patterns with the more sonorous glides in processes such as nasal harmony.
Although G’s liquids were realised as glides (and so were safely more sonorous
than h) at the time of the transcriptions shown here, later pronunciations have onset
ls which are pronounced as lateral. Notice also that, although h patterns as very
sonorous in harmony processes, it behaves as nonsonorous in syllabification, avoid-
ing moraic positions and preferring onsets.
19. Forms discussed in section 3, such as piz from /pliz/ ‘please’, seem to violate I-
Contig as well, since the /l/ is absent in the output. As section 5 will show,
however, most of G’s onset reductions are actually effected through coalescence,
which is vacuous in a word like ‘please’. As a result, the forms in section 3 (in
which the coalescing segments of the input onset cluster are themselves contiguous)
do not violate I-Contig. The fi- forms do violate I-Contig, however, since the
coalescing consonants are separated by a vowel.
20. An analysis of the mechanism that produces the fi- is tangential to the present
treatment of G’s onsets. I would suggest, however, that the fi- can be analysed as
resulting from the conflict of faithfulness constraints (here faithfulness to syllables)
with markedness constraints of the alignment family proposed by McCarthy and
Prince (1993). An alignment constraint requiring the stem of a word to coincide
at the left edge with the tonic syllable would have the result that underlying stem
material in an unstressed initial syllable is deleted. (This would also account for the
earlier stage in which unstressed initial syllables were merely absent.) Faithfulness
to the syllable (but not the segments) has been promoted by the fi- stage, and can be
satisfied by prefixation. An analysis of fi- as a prefix could account for its nondefault
shape and its secondary stress, which sets it off somewhat from the following stem
(something I have tried to represent by the hyphen between fi and the following
segments). It is my suspicion that the segmental content of the prefix fi- came from
‘VNA’, the name of G’s day-care centre, which G pronounced as fi- εn. Other types of
melodic overwriting, such as in Kolami, may be due to a conflict between alignment
constraints on a prefix and alignment constraints on the stem segments.
21. [Editors’ note] The apparent discrepancy between forms like tree [pi] in (15) and
draw [dɔ ] in (2) will be explained in section 6.
22. G pronounces English r and w both as [w], cf. (2). I leave aside the question of
whether this substitution is in the input or the output. To stress the labiality of r, I
henceforth render r as rw in G’s input. In her output, of course, it is w.
23. [Editors’ note] For a fusion analysis of coalescence in correspondence theory, see
McCarthy and Prince (1995) and Pater (1999: 315).
24. [Editors’ note] The output [tw ] (labialised t) will have to be eliminated by some
independent markedness constraint.
25. No-coal is part of the linearity constraint of McCarthy and Prince (1995a)
which serves to prohibit both coalescence and metathesis. McCarthy and Prince
suggest that the two functions might need to be separated, and I have done so
in this chapter because G’s grammar treats coalescence and metathesis separately.
Specifically, /glide could force No-coal violations (in the fi- cases) when it could
no longer force metathesis. The constraints forbidding coalescence and metathesis
must be different and separately ranked to produce different ranges of application.
No-coal is like Lamontagne and Rice’s (1995a,b) *Multiple Correspondence,
Markedness and faithfulness 105
consonants are adjacent at the morphological level, while in G’s language they are
not.
34. This illustrates a consequence of adopting McCarthy and Prince’s (1995a) distinction
between the Ident[feature] constraints and Max. Ident[feature] is only violable
by a segment that actually appears in the output. Therefore the same segment cannot
violate both Max and Ident[feature]. This is in contrast with views of faithfulness
which use Parse-segment and Parse-feature, both of which are violated
when an underlying segment fails to be parsed in the output.
35. Glides in glide-initial words such as wum ‘room’ cannot delete because of the
constraint onset, discussed in section 3, which requires syllables to have an onset.
36. The behaviour of the small sample of words such as fok ‘smoke’, with s + m +
rounded vowel, points to a further aspect of G’s coalescence. The f of fok (‘smoke’)
combines the place of the m with the manner features of the s. The manner fea-
tures do not ‘mix and match’ in the coalescence (e.g., *pok, with a stop from the m
that is voiceless like the s), but rather all the manner features of the least sonorous
consonant remain. This suggests some feature-geometric clumping of the manner
features that does not include the place features. The same clumping is also pre-
sumably responsible for examples such as ay (‘sky’), where the underlying form
is presumably /say/. The voicelessness of the s does not transfer to the neutralised
stop G has assigned to the voiced phoneme .
References
Allerton, D. J. (1976). Early phonotactic development: some observations on a child’s
acquisition of initial consonant clusters. Journal of Child Language 3. 429–433.
Austin, P. (1981). A Grammar of Diyari, South Australia. Cambridge: Cambridge
University Press.
Boersma, P. (1998). Functional Phonology. Formalizing the Interactions between Artic-
ulatory and Perceptual Drives. The Hague: Holland Academic Graphics.
Bond, Z. S. and H. F. Wilson (1980). /s/ plus stop clusters in children’s speech. Phonetica
37. 149–158.
Chin, S. B. and D. A. Dinnsen (1992). Consonant clusters in disordered speech: con-
straints and correspondence patterns. Journal of Child Language 4. 417–432.
Chomsky, N. and M. Halle (1968). The Sound Pattern of English. New York: Harper
and Row.
Demuth, K. (1995). Markedness and the development of prosodic structure. In
J. N. Beckman (ed.) Proceedings of NELS 25. GLSA, University of Massachusetts,
Amherst.
Edwards, M. L. and L. D. Shriberg (1983). Phonology: Applications in Communications
Disorders. San Diego: College-Hill Press.
Emeneau, M. B. (1955). Kolami: a Dravidian Language. University of California
Publications in Linguistics, Vol. 12. Berkeley: University of California Press.
Gnanadesikan, A. E. (1995). Deriving the Sonority Hierarchy from ternary scales. Paper
presented at the 1995 Annual Meeting of the Linguistic Society of America.
(1997). Phonology with Ternary Scales. Ph.D. dissertation, University of Massa-
chusetts, Amherst.
Markedness and faithfulness 107
Goad, H. and Y. Rose (this volume). Input elaboration, head faithfulness, and evidence
for representation in the acquisition of left-edge clusters in West Germanic.
Grunwell, P. (1981). Clinical Phonology. London: Croom Helm.
Hankamer, J. and J. Aissen (1974). The sonority hierarchy. In A. Bruck, R. A. Fox, and
M. W. La Galy (eds.) Papers from the Parasession on Natural Phonology. CLS.
131–145.
Hashimoto, O.-K. Y. (1972). Phonology of Cantonese. Cambridge: Cambridge Univer-
sity Press.
Hayes, B. (1999). Phonetically-driven phonology: the role of Optimality Theory and
inductive grounding. In M. Darnell, E. Moravscik, M. Noonan, F. Newmeyer, and
K. Wheatly (eds.) Functionalism and Formalism in Linguistics, Vol. I: General
Papers. Amsterdam: John Benjamins. 243–285.
Hooper, J. B. (1976). An Introduction to Natural Generative Phonology. New York:
Academic Press.
Ingram, D. (1976). Phonological analysis of a child. Glossa 10. 3–27.
(1989a). First Language Acquisition: Method, Description, Explanation. Cambridge:
Cambridge University Press.
(1989b). Phonological Disability in Children. London: Cole and Whurr (2nd edn).
Inkelas, S. (1995). The consequences of optimization for underspecification. In
J. Beckman (ed.) Proceedings of the NELS 25. 287–302.
Jespersen, O. (1904). Lerhbuch der Phonetik. Leipzig: Teubner.
Kari, J. (1973). Navajo Verb Prefix Phonology. Ph.D. dissertation, University of New
Mexico (published 1976, Garland Press).
Kehoe, M. M. and C. Stoel-Gammon (1997). The acquisition of prosodic structure:
an investigation of current accounts of children’s prosodic development. Lg 73.
113–144.
Kiparsky, P. (1994). Remarks on markedness. Paper presented at TREND 2.
Lamontagne, G. and K. Rice (1995a). Navajo coalescence, deletion and faithfulness.
Paper presented at the 1995 Annual Meeting of the Linguistic Society of America.
(1995b). A Correspondence account of coalescence. In J. N. Beckman, L. Walsh
Dickey, and S. Urbanczyk (eds.) Papers in Optimality Theory. University of
Massachusetts Occasional Papers in Linguistics 18. Amherst: GLSA. 211–223.
Light, T. (1977). The Cantonese final: an exercise in indigenous analysis. Journal of
Chinese Linguistics 5. 75–102.
Macken, M. A. (1980). The child’s lexical representation: the ‘puzzle-puddle-pickle’
evidence. JL 16. 1–17.
McCarthy, J. and A. Prince (1986). Prosodic morphology. MS., University of Massa-
chusetts, Amherst and Brandeis University.
(1990). Foot and word in prosodic morphology: the Arabic broken plural. NLLT 8.
209–283.
(1994). The emergence of the unmarked: optimality in prosodic morphology. In
M. Gonzàlez (ed.) Proceedings of the NELS 24. 333–379. Amherst: GLSA.
(1995a). Faithfulness and reduplicative identity. In J. N. Beckman, L. Walsh Dickey,
and S. Urbanczyk (eds.) Papers in Optimality Theory. University of Massachusetts
Occasional Papers in Linguistics 18. Amherst: GLSA. 249–384.
(1995b). Prosodic morphology. In J. Goldsmith (ed.) The Handbook of Phonological
Theory. Cambridge, Mass.: Blackwell.
108 Amalia Gnanadesikan
1. Preliminaries∗
Several recent investigations of the development of left-edge clusters in West
Germanic languages have demonstrated that the relative sonority of adjacent
consonants plays a key role in children’s reduction patterns (e.g., Fikkert
1994, Gilbers and Den Ouden 1994, Chin 1996, Barlow 1997, Bernhardt and
Stemberger 1998, Gierut 1999, Ohala 1999, Gnanadesikan this volume). These
authors have argued that, for a number of children, at the stage in development
when only one member of a left-edge cluster is produced, it is the least sonorous
segment that survives, regardless of where this segment appears in the target
string or the structural position that it occupies (head, dependent, or appendix).
To illustrate briefly, while the more sonorous /S/1 is lost in favour of the stop
in /S/+stop clusters, /S/ is retained in /S/+sonorant clusters; similarly, the least
sonorous stop survives in both /S/+stop and stop+sonorant clusters, in spite
of the fact that it occurs in different positions in the two strings. To account
for reduction patterns such as these, a structural difference between /S/-initial
and stop-initial clusters need not be assumed. This would seem to fare well
in view of much of the recent constraint-based literature which de-emphasises
the role of prosodic constituency in favour of phonetically based explanations
of phonological phenomena (see, e.g., Hamilton 1996, Wright 1996, Kochetov
1999, Steriade 1999, Côté 2000).
In this chapter, we focus on a second set of reduction patterns for left-edge
clusters, one which is not addressed in most of the sonority-based literature on
cluster reduction in child language (L1): these patterns reveal a preference for
structural heads to survive. For example, while the stop is retained in clusters
of the shape /S/+stop and stop+sonorant, it is the sonorant that survives in
/S/+sonorant clusters. The only sources that we have found where explicit ref-
erence is made to the retention of heads are Spencer (1986), who re-analyses
Amahl’s data (from Smith 1973) within the context of non-linear phonology,
and Gilbers and Den Ouden (1994), who nonetheless appear to reject a head-
based approach at the end of their paper in favour of one based on sonority.
109
110 Heather Goad and Yvan Rose
Not surprisingly, reference to heads is also absent from the L1 literature which
predates the development of non-linear phonology, when the syllable was not
widely accepted as a formal constituent (following Chomsky and Halle 1968).
For example, although Amahl’s grammar shows a preference for the mainte-
nance of heads, Smith (1973) appeals to sonority in his discussion of cluster
reduction.2
In order to provide a unified account for both patterns of cluster reduction,
we adopt the position that the theory of syllable structure must encode a for-
mal difference between /S/-initial clusters and obstruent-initial clusters more
generally,3 where /S/ is represented as an appendix, as was standardly assumed
in the literature on non-linear phonology (see below). We shall demonstrate
further that constraints must make explicit reference to heads of syllable con-
stituents; specifically, the constraint MaxHead(Onset) will play a crucial
role in our analysis.
This approach entails that (adult) inputs are fully prosodified. However, full
prosodification requires knowledge of the structures that are permitted in the
target language. Accordingly, we propose that children’s inputs are initially
prosodified for simplex onsets and rhymes/nuclei, that is, for heads of sub-
syllabic constituents only (cf. Dresher and Van der Hulst 1998). While there
is often a correlation between the head of a left-edge cluster and low sonor-
ity, we demonstrate that heads cannot be determined solely on the basis of
relative prominence; distributional evidence must be factored in and under-
standing this evidence requires knowledge that is relatively sophisticated. We
propose that children initially make decisions about headedness on the basis
of sonority until the distributional facts are understood. We argue that it is
for this reason that two patterns of cluster reduction are observed in the ac-
quisition data, what we will henceforth call the sonority pattern and the head
pattern.
The chapter is organised as follows. In section 2, we introduce the two pat-
terns of cluster reduction under investigation. The data on which we focus
are drawn from children learning three West Germanic languages: English,
German, and Dutch. We turn in section 3 to our assumptions about acquisition
in the context of Optimality Theory (OT), the framework in which our anal-
ysis is couched. Our central point will be to demonstrate that the acquisition
of an OT grammar involves two components: (i) the elaboration of inputs; and
(ii) constraint re-ranking. Our focus will primarily be on the former, specif-
ically on the development of prosodic structure. In section 4, we turn to an
investigation of sub-syllabic structure in West Germanic. We motivate repre-
sentations for various cluster types and go through the evidence that children
require in order to achieve target-like syllabification of these clusters. As will
be seen in section 5, for clusters whose structural head corresponds to the least
sonorous segment, a single pattern of cluster reduction is observed in the ac-
quisition data. However, when the head does not correspond to what is least
Acquisition of left-edge clusters in West Germanic 111
dates provided and, for some children, after these dates as well. Note, finally,
that concerning the Dutch children, because few forms are provided by Fikkert
(1994) for a given child at any particular point in time, it is difficult to get
a clear sense of the beginning and end points for the stages described in (1)
and (3). The classification of children into patterns and the age ranges pro-
vided are therefore drawn from the tables that Fikkert provides in Appendix D
(pp. 318–329).
(1) Sonority pattern:
Substitution patterns: Children: Sources:
obs+son → obs Dutch: Noortje (2;5.23-2;11.0) Fikkert (1994)
S+obs → obs Robin (1;10.9-2;0.20) Fikkert (1994)
S+son → S Tirza (2;0.18-2;1.17) Fikkert (1994)
English: Gitanjali (2;3-2;9) Gnanadesikan
(this volume)
Subject 25 (4;10) Chin (1996),
Barlow (1997)
Representative data from Gitanjali and Robin are provided in (2). (All forms
are transcribed as in the original sources.) Empty cells represent gaps for the
particular cluster or segment type in a given language.6 As can be seen, the least
sonorous segment is preserved for each cluster type, regardless of its position in
the target string.7 For Gitanjali, the [p]-initial outputs for ‘twinkle’ and ‘quite’
and the [f ]-initial outputs for ‘smoke’ and ‘sweater’ arise through coalescence
(see Gnanadesikan this volume); as features from the least sonorous consonants
survive, these forms are consistent with the sonority pattern as well. Coalescence
also seems to be responsible for Robin’s [f ]-initial output for [z ]aaien. The
consonants that survive from Robin’s clusters in ver[st]opt and [sl]apen, [p]
and [f ], appear to be due to consonant harmony; importantly, in both cases,
the non-harmonised counterparts, [t] and [s], correspond to the least sonorous
member of the cluster.
Previous studies have shown that the sonority pattern of cluster reduction can
be analysed without drawing a formal distinction between obstruent-initial and
/S/-initial clusters; indeed, no syllable-internal constituency need be assumed
at all (e.g., Barlow 1997, Bernhardt and Stemberger 1998, Gnanadesikan this
volume). However, there is a second pattern of cluster reduction, what we refer
to as the head pattern, which seems to require reference to syllable-internal
structure: it is the head of the onset constituent that survives, as can be observed
in (3).8 Heads are underlined.
The table in (4) provides representative data from Amahl and Annalena.9
This pattern is characterised by reduction of a cluster to the head of the target
114 Heather Goad and Yvan Rose
structure, which does not necessarily correspond to the least sonorous segment
of the string. We can observe in (4a) that for left-headed clusters, it is the
initial obstruent that survives. In (4b) and (4c), by contrast, the constituent
head is the second member of the cluster, and it is this consonant that sur-
vives, regardless of its relative sonority. While Amahl’s [w]-initial outputs
for ‘flag’ and ‘friend’ in (4a) may appear to interrupt the pattern, [w] is his
substitute for /f/, not for /l,r/; in non-harmonising contexts, /l,r/ are realised
as liquids at this stage in development (cf. Goad 1997). Note as well that in
‘three’ and ‘Shreddies’, [ d]◦
is Amahl’s substitute for both /θ / and /ʃ / in initial
position.
For completeness, we have included the /S/+rhotic data in (4d), as this cluster
is licit in German.10 However, the acquisition of /Sr/ is complicated, which is
why it is listed separately from the other S+sonorant clusters and why the head
of the cluster has not been underlined. On the basis of what was observed in
(4c), we would have expected /r/ to be retained in (4d). Counter to expectation,
though, it is /S/ that survives. /S/ is realised by Annalena most often as [θ ,ð ] at
this stage in development; /r/, on the other hand, surfaces as [ʁ ] or [x], aside
from a few cases of [ʔ ,h] and [], the latter due to consonant harmony. We delay
further discussion of /Sr/ until sections 4.3 and 6.
Similarities and differences between the head and sonority patterns can be
seen most clearly from the summary table provided in (5).
(5) Sonority pattern versus head pattern:
Sonority pattern: Head pattern:
(a) obstruent+sonorant obstruent obstruent
(b) S+obstruent obstruent obstruent
(c) S+sonorant S sonorant
The two patterns diverge for /S/-initial clusters that rise in sonority (5c). This
is precisely where there is a mismatch between the head of the cluster and the
segment that is least sonorous. This mismatch precludes any comprehensive
analysis of the head pattern which relies on constraints that refer to relative
sonority.
We shall provide a unified account for both the head and sonority patterns
in section 5, one which focuses on the retention of heads in both cases. We
must first discuss our assumptions about acquisition (section 3) and provide the
structures that we adopt for obstruent-initial and /S/-initial clusters in the target
grammars (section 4).
formally different in the sense that early grammars, at every stage in their devel-
opment, reflect possible adult grammars. Thus, we contend that all systematic
alternations observed in child data must be analysed using constraints that are
independently motivated in adult grammars, and further, we adopt the strong
position that all constraints are innate (Gnanadesikan this volume). Conse-
quently, (i) children’s grammars contain no more than adult grammars, that is,
there are no child-specific constraints (for a different view, see Pater 1997); and
(ii) they contain no less than adult grammars, that is, there is no emergence
of constraints or of the primitives, structures, and operations that they refer to
(Goad 2001).
To a great extent, children’s early grammars differ from adult languages in
predictable ways. Perhaps the most obvious of these is that children’s early out-
puts are prosodically unmarked. For example, there is an initial preference for
CV syllables, yielded primarily by segmental deletion. As patterns such as these
are systematically observed across children, they must reflect early grammatical
organisation. In the OT literature on acquisition, this is typically expressed in
terms of preferred constraint rankings where markedness constraints initially
outrank faithfulness constraints (Demuth 1995, Smolensky 1996, Gnanadesikan
this volume, inter alia; cf. Hale and Reiss 1998). This view formally expresses
the observations of Jakobson (1941/68) and Stampe (1969), that early grammars
reflect what is cross-linguistically unmarked. A crucial part of the acquisition
process therefore involves re-ranking of constraints, on the basis of positive
evidence (Chomsky 1981), in order to allow for the production of more marked
structures when these are found in the target language.
4. Representations
As already mentioned, an analysis of the sonority pattern can be provided
without reference to syllable-internal structure: what survives can be determined
solely by a prohibition on consonant sequencing and by a family of constraints
that reflect relative prominence in the acoustic signal. By contrast, it appears
that the head pattern cannot be so analysed: obstruent-initial clusters must be
distinguished from /S/-initial clusters on structural grounds.
118 Heather Goad and Yvan Rose
Outside of acquisition, the clearest evidence of the need for two types of
structures can be found by observing that the presence or absence of obstruent-
initial clusters in a language is independent of the presence or absence of /S/-
initial clusters. A comparison of (6a) and (6b) reveals that Spanish, for example,
contains only the former structure, while Acoma (Miller 1965) allows only for
the latter.
to the only segment in the onset constituent, that is, to the rightmost member of
the cluster. Following from this, we distinguish two types of non-heads: depen-
dents which are constituent-internal; and appendices which are linked to some
higher prosodic constituent, in the unmarked case, to the prosodic word (PWd),
as we discuss below.
(7) Unmarked syllabification options for left-edge clusters:
a Branching onset: b Appendix+onset:
The notion head of onset will play a significant role in our account of L1 cluster
reduction. Outside of this context, in section 5.3, we shall provide support-
ing evidence for this notion from cluster reduction in southeastern Brazilian
Portuguese, as well as in German (hypocoristics) and Québec French.
In some languages, syllabification options other than those in (7) will be re-
quired. For example, as concerns the structure in (7a), Kaye (1985) has argued
that, in Vata, obstruent+liquid strings are syllabified with the liquid in the nu-
cleus. Concerning appendixal /S/, two options are required for West Germanic.
While German respects the unmarked option in (7b), Dutch and English require
/S/ to be linked to the syllable node in /S/+obstruent clusters. In German, /ʃ /+C
clusters occur stem-initially (e.g., be-stehen [bə .ʃ te .hə n] ‘insist’), but they do
not occur morpheme-internally (see, e.g., Hall 1992). /ʃ / must therefore link
to the PWd. In Dutch and English, by contrast, /s/+C clusters have a wider
distribution. Van der Hulst (1984) uses examples such as ek.ster ‘magpie’ to
argue against analyses along the lines of (7b) for Dutch and in favour of one
where /s/ is a ‘syllable prefix’ (contra Trommelen 1984, Fikkert 1994: 51).
Similar arguments are made by Zonneveld (1993) and Booij (1995), as well
as by Levin (1985) for English. We elaborate on the problem for English as
follows. There are many words like con.sta.ble where the rhyme preceding
/s/ is binary branching. Given that, with limited exceptions, ternary-branching
rhymes are not permitted in English (see esp. Harris 1994), /s/ must be analysed
as an appendix to the following syllable.
Clearly, word-internal data are required to determine whether the syllable
or PWd is the appropriate host for /S/. If /S/ were linked to the syllable in
the unmarked case, the German-speaking child would require indirect negative
evidence to arrive at the alternative analysis in (7b). We propose, therefore, that
120 Heather Goad and Yvan Rose
association to the PWd is the unmarked case. Positive evidence will then be
available for learners to determine that the syllable is the correct host in Dutch
and English.
While the typology in (6) demonstrates the need for two different represen-
tations for left-edge clusters, (7a) and (7b), it does not address the question of
where to draw the line between cluster type. As UG requires branching on-
sets to rise in sonority, there is little dispute about the correct representations
for obstruent+liquid and /S/+stop clusters (but see note 13).14 Accordingly,
once the child attempts to build structures other than singleton onsets at the left
edge, representations for these two cluster types should pose little difficulty:
obstruent+liquid clusters fit the ideal profile for a branching onset (Clements
1990), (7a); and a sonority fall or plateau is observed in /S/+obstruent clusters,
so the child will realise that /S/ cannot be part of the onset and must instead
occupy an appendix position, (7b). In the latter case, the child’s analysis will be
consistent with the observation that left-edge appendices are limited to /S/ in the
unmarked case.15 Additional positive evidence for the postulation of these two
types of structures is readily available through the child observing that three-
member clusters at the left edge are (virtually) always /S/-initial. If UG permits
onset constituents to be at most binary, as is standardly assumed, sequences of
the shape /S/+stop+liquid must involve a combination of the structures in (7a)
and (7b).
One of the greatest difficulties for the linguist lies in determining the appro-
priate representation for /S/-initial clusters that rise in sonority. In section 4.2
below, we shall provide a number of arguments to demonstrate that, at least in
West Germanic, all /S/-initial clusters are syllabified with left-edge appendices.
Importantly, then, there is no direct correlation between headedness and some
phonetic property (low sonority or low air flow); instead, distributional facts
must be taken into consideration. In section 5.1, we shall propose that it is for
this reason that these clusters also pose the greatest challenge for the learner.
However, based on the data from Amahl and Annalena provided earlier in (4c),
where /S/+sonorant → sonorant in the head pattern, we hope that the sceptical
reader who equates rising sonority with branching onset status can accept that
this is the analysis that these two children have arrived at.
ii Fric+liquid fl *θ l *ʃ l fl l fl
fr θr ʃr fr r fr
b Appendix-initial /S/+liquid sl sl ʃl
*sr *sr ʃr
The absence of */tl/ is typically attributed to a constraint that forbids the two
members of a branching onset from agreeing in place, what we have called
*PlaceIdent in (9). The constraint holds for all places of articulation; thus,
alongside */tl/, */pw/ is illicit in English, as is */p / in Dutch and */pv/ in
German.
(9) *PlaceIdent:
Adjacent consonants within an onset constituent cannot be specified
for the same articulator
The restriction on place identity in onsets is widely attested across languages,
strongly suggesting that it is the UG unmarked option. Accordingly, the child
needs no evidence to rule out clusters of this shape.16
Turning to (8a.ii), the ungrammatical branching onsets with fricative heads in
the English column provide us with more information on how *PlaceIdent
works. On the one hand, exact identity is not enough to rule out a cluster: dental
/θ / cannot combine with alveolar /l/. At the same time, palato-alveolar /ʃ / and
post-alveolar /r/ are not close enough to be ruled out. In order to reconcile these
facts, we adopt Lebel’s (1998) proposal that *PlaceIdent targets the level
of the articulator node, and not the level of the feature that a given articulator
dominates. If all coronals – with the exception of /r/ – are specified for a Coronal
node, the correct results will obtain. See (10).17 Importantly, we assume that
the representation in (10b) holds, independently of whether /r/ is articulated as
coronal or dorsal. Indeed, in two of the languages under investigation, Dutch
and German, both coronal and dorsal variants are attested (see, e.g., Booij 1995
on Dutch, Hall 1992 and Wiese 1996 on German).
122 Heather Goad and Yvan Rose
adopting the representation in (10b) is that it enables us to account for the place
facts observed in (8a): coronal-initial branching onsets which contain /r/ as their
dependent will not violate *PlaceIdent in (9). Beyond this, however, the
representation in (10b) will also factor into our analysis of the asymmetries
in the inventories of /S/-initial clusters provided in (8b) above; in particular, it
will play a central role in our explanations for: (i) why /sl/ is licit in English
and Dutch while */sr/ is not (section 4.3); and (ii) why Annalena incorrectly
analyses German /ʃ r/ as branching onset (section 6).
(13) AppendixLicensing(OnsetHead/Place):
An onset head must bear a Place node in order to license an appendix.
(ii) that there is no branching within the onset constituent, which is univer-
sally left-headed (see (7)); and (iii) that the initial /S/ is linked directly to higher
prosodic structure (to the PWd, in the unmarked case, as discussed in section 4).
(14c) requires knowledge similar to that involved in the analysis of (14b); how-
ever, in all cases other than Dutch /s / which we leave aside, head status is
assigned to the least sonorous segment in the string and, thus, comes for free,
as in (14a).
(14) Adult inputs:
a Branching onset b Rising sonority /S/-initial c Falling sonority /S/-initial
We propose that these elaborate input structures are not mastered at the
earliest stage in acquisition. Indeed, they cannot be, as not all languages al-
low for branching constituents and appendices. Children’s inputs are initially
prosodified for simplex onsets and rhymes/nuclei only, as in (15). That is, as
far as prosodic structure is concerned, inputs only contain CV (core) syllables.
Any non-prosodified segmental material that is perceived by the child remains
unassociated. As all constituents must be headed, the segments syllabified as
singleton onsets and nuclei from each of the strings in (15) are by definition the
heads of their respective constituents.
What mechanism, though, does the child use to select heads? Without any
knowledge of the relations that hold across strings of segments, the head of the
onset and the head of the rhyme/nucleus can only be defined on the basis of
relative prominence, low sonority (or low airflow) in the former case and high
sonority in the latter, in the spirit of Jakobson’s (1941/68) Principle of Maximal
Contrast. In the strings in (15), which represent the first stage in development
(stage 1), the least sonorous consonant will be assigned head status by default.
This strategy leads to selection of the correct onset head in (15a) and (15c),
as in these cases, there is a correlation between the head of a left-edge cluster
and low prominence. However, the incorrect head is selected in rising sonority
/S/-initial clusters, (15b).
(15) Inputs for sonority pattern child (stage 1):
a Branching onset b Rising sonority /S/-initial c Falling sonority /S/-initial
Acquisition of left-edge clusters in West Germanic 129
The result is the input specifications for the sonority pattern. Indeed, we
propose that the source for the two patterns of cluster reduction is the mis-
match between the head of the onset and relative prominence in rising sonority
/S/-initial clusters. Correct selection of the head requires an understanding of
the distributional (positive) evidence available in the ambient language. The
child will be forced to take account of this evidence as s/he attempts to prosod-
ify the remaining segments in the cluster, on the way to achieving adult-like
inputs. Accordingly, through the course of acquisition, inputs become more
elaborately structured, reflecting this new understanding. The inputs in (16)
for the head pattern correspond to this stage in acquisition, which we label
stage 2.
Our analysis of the sonority pattern assumes that there is a stage in develop-
ment before there is any understanding of syllable-internal complexity beyond
the UG-given simplex onsets and rhymes/nuclei. By definition, then, the sonor-
ity pattern must precede the head pattern, as is the case for Robin (see section 2:
sonority pattern 1;10.9-2;0.20; head pattern 2;1.9-2;3.24).22 However, it need
not be the case that all children exhibit both patterns of cluster reduction in
their outputs. On the one hand, head-pattern children may have come to un-
derstand the relations that hold across strings of consonants by the time they
attempt to produce words with target clusters, thereby displaying no evidence
of having gone through the sonority stage. Prior to this, they may have only
selected words for production which do not contain (/S/+sonorant) clusters;
this so-called selection and avoidance strategy is common in early acquisition
(e.g., Ferguson and Farwell 1975, Schwartz and Leonard 1982, Stoel-Gammon
and Cooper 1984). On the other hand, sonority-pattern children may skip the
head stage of cluster reduction, as their building of target-like input structures
may coincide with the demotion of the markedness constraints which had pre-
viously prevented the realisation of clusters in their outputs. The latter scenario
is perhaps most commonly attested, thereby accounting for the fact that many
scholars have remarked on the role that sonority plays in cluster reduction (see
section 1 for references).
For children who exhibit both the sonority and head patterns of cluster re-
duction, the development from stage 1 to stage 2 will presumably be gradual,
130 Heather Goad and Yvan Rose
unlike what is reflected in the transition from (15) to (16). It is likely that the
child will find it easiest to build the target-like input for a branching onset,
as (i) the head corresponds to the least sonorous segment and thus is correct
from the outset; and (ii) all that is required to prosodify the dependent is to
establish an additional link to the onset constituent. However, as long as the
markedness constraints which preclude clusters from appearing in outputs are
dominant, any stage intermediate between stages 1 and 2 for obstruent-initial
and /S/+stop clusters will be impossible to observe in the child’s outputs.
An empirical difference will only be observable for rising sonority /S/-initial
clusters.
Concerning the latter, a common stage intermediate between stages 1 and
2 appears to be that where target-like inputs are built for /S/+nasal clusters,
but where stage 1 inputs persist for /S/+lateral clusters. With markedness con-
straints ranked high, this yields the head pattern of cluster reduction for /SN/
clusters, /SN/ → /N/, but the sonority pattern for /Sl/ clusters, /Sl/ → /S/. Recall
in this context that the Dutch children in Fikkert’s study who follow the head
pattern (Tom, Robin, Catootje) unexpectedly treat /sl/ in the same manner as
fricative-initial clusters. This is probably also true of English-speaking Joan
(see note 8).23 These children thus display reductions as follows: /s/+nasal →
nasal, but /sl/ → /s/ parallel to /fl/ → /f/. We believe that these children, in
contrast to Amahl and Annalena, are in a stage intermediate between the head
and sonority patterns: they have adult-like (stage 2) inputs for /SN/ clusters,
but stage 1 inputs for /Sl/ clusters. Below, we elaborate on why we consider
their /Sl/ clusters to behave unexpectedly from the point of view of the head
pattern.
As discussed in section 4.2, both /Sl/ and /SN/ are ill-formed as branching
onsets, and some children, those who follow the head pattern in its entirety,
clearly attend to the various types of evidence available for this at once. Other
children, those who fall into the intermediate pattern, seem to have latched
onto the fact that there are significant differences between laterals and nasals
as concerns their distribution in clusters. On the one hand, the lack of place
constraints on the nasal in /SN/ clusters provides clear evidence for the child
to come to an early understanding of clusters of this shape. Specifically, in
spite of the rising sonority, labial place on /m/ signals that /Sm/ could never
be a branching onset. Thus, once the child attempts to incorporate the nasal
into his/her stage 1 input syllabification, labial place forces a re-analysis of
this segment as head. This results in the appendix-initial input representation
at stage 2 for /Sm/ and, by extension, for /Sn/ as well.
Building adult-like input representations for /Sl/ clusters, on the other hand,
is less straightforward, because /l/, unlike /m/, is an optimal onset dependent.
Thus, when the child attempts to incorporate /l/ into his/her stage 1 input forms,
s/he may first try to syllabify /Sl/ as a branching onset. The place identity
Acquisition of left-edge clusters in West Germanic 131
facts, however, are not consistent with this analysis. There is thus a conflict as
concerns the syllabification of /l/ that may delay some children’s incorporation
of this segment into their input syllabification, resulting in the maintenance of
the impoverished stage 1 structure for /Sl/ for some time after /SN/ clusters
have been appropriately prosodified (as per stage 2). This, in essence, is a type
of selection and avoidance. What, though, ultimately compels these children to
move beyond their stage 1 forms and to posit the appropriate input representation
for /Sl/? We suggest that it must take place on grounds of parsimony, that is,
to resolve the fact that /Sl/ is an outlier as concerns the full prosodification of
inputs.
Henceforth, we shall ignore the different treatment of /Sl/ and /SN/, and shall
consider the learners as falling into two groups only, as we had earlier: head-
pattern children and sonority-pattern children, where those children who fall
into the intermediate stage are grouped with those who follow the head pattern
in its entirety.
In the following two sections, we turn to the constraints required to capture
cluster reduction. We then demonstrate in section 5.4 that, in comparing the
sonority- and head-pattern children, even though their input representations
differ in terms of how elaborately structured they are, the same set of marked-
ness constraints yields only CV syllables on the surface for both groups. In
comparing the head-pattern children with adults, we maintain that, although
these children have target-like inputs, their grammar differs from the end-state
grammar in terms of the relative ranking of markedness constraints and seg-
mental faithfulness.
5.2 Constraints
To reflect the structural differences that are observed between branching onsets
and appendix-initial clusters, two markedness constraints are required, those
in (17).
5.3 More on M a x H e a d
Constraints that express faithfulness to prosodic heads have been proposed
throughout the OT literature, in particular, those that express faithfulness to
heads of feet (see, e.g., Alderete 1995, Itô et al. 1996, McCarthy 1997, Pater
2000).25 In the spirit of Alderete (1995), we have proposed a generalised head
faithfulness constraint, one that takes a number of prosodic categories as argu-
ments. As we assume that the sub-syllabic constituents of the syllable are onset,
rhyme, and nucleus, these constituents fall into the class of categories that can
be exploited by (20). The category that may appear suspect is onset. We shall
see in section 5.4 that MaxHead(Onset) plays a crucial role in our analysis
of L1 cluster reduction. A question that immediately arises is whether there is
any evidence for MaxHead(Onset) outside of acquisition. We shall briefly
demonstrate below that it plays a role in the distribution of branching onsets
in a dialect of Brazilian Portuguese. More generally, the analysis of Brazilian
Portuguese provides empirical support in favour of the position that adult inputs
must be fully prosodified, thereby confirming the need for a generalised head
faithfulness constraint like (20).
In southeastern Brazilian Portuguese, branching onsets are tolerated only in
stressed syllables. Input clusters are otherwise reduced, as can be seen from the
data in (21) (from Harris 1997: 363).
(21) Branching onsets in southeastern Brazilian Portuguese:
a prátu ‘plate’
patʃ ı́u, *pratʃ ı́u ‘small plate’
b lı́vu, *lı́vru ‘book’
livrétu ‘small book’
134 Heather Goad and Yvan Rose
Harris (1997) points out that branching onsets are restricted to prosodically
strong positions in other languages as well. For instance, in German hypocoris-
tics, such structures are not permitted in unstressed syllables, e.g., Gabriella →
Gábi, *Gábri (see Itô and Mester 1997). Further, in most languages, branch-
ing onsets are not tolerated before phonetically empty nuclei (with the notable
exception of continental French, e.g., sablØ ‘sand’ (Charette 1991)); most di-
alects of Québec French are like this and alternations are observed in this
context, e.g., sabØ, *sablØ ‘sand’ vs. sabl-e ‘to sand’ (see Nikièma 1999 for a
recent structurally based account). Finally, Y. Rose (2000) has shown that the
Brazilian Portuguese pattern is found in the L1 acquisition of Québec French.
Importantly, in all of these cases, it is the head of the onset which survives
cluster reduction.
Let us turn now to demonstrate how MaxHead interacts with *Complex
to arrive at the pattern observed in Brazilian Portuguese. The tableau in (22)
reveals that two MaxHead constraints are required, MaxHead(foot)
and MaxHead(Ons), both of which must be ranked above *Com-
plex. (Parentheses mark the edges of feet, and periods, the edges of
syllables.)
a li(vré.tu) i. li(vré.tu) *
ii. li(vé.tu) *! *
b (lı́.vru) i. (lı́.vru) *!
ii. (lı́.vu) *
iii. (lı́.ru) *! *
Importantly, all segments prosodified within the head of the foot, the syl-
lable underlined in li(vré.tu), must surface in order to completely satisfy
MaxHead(Ft), as defined in (20). This requires that inputs be fully prosodi-
fied. If only syllable peaks were prosodified in inputs, MaxHead(Ft) could
not select between (22a.i) and (22a.ii), as the vowel from the input stressed
syllable is present in both candidate outputs.
A consequence of full syllabification is that heads of constituents must be
structurally defined, rather than based on relative prominence. This is particu-
larly important as concerns onsets; although in the case of feet, the head syllable
is the most salient, recall from earlier discussion that in left-edge clusters, there
is no correlation between relative prominence and headedness. This, in fact, is
central to our explanation of the two patterns of cluster reduction observed in
early grammars.
5.4 Tableaux
In the following sections, we illustrate how highly ranked MaxHead(Ons)
yields the desired outputs for both the head and sonority patterns of clus-
ter reduction. Recall from earlier discussion that the difference between
these patterns can only be observed in /S/+sonorant clusters. In input
branching onsets and /S/+obstruent clusters, the distinction is obscured
by the fact that the head of the cluster is also the segment that has the
lowest sonority. In this section, we shall exemplify how the constraints pre-
sented in section 5.2, in combination with the input representations pro-
posed in section 5.1, yield the correct results for the patterns observed. We
shall focus on branching onsets and on rising sonority /S/-initial clusters
only.
5.4.1 Branching onsets For children who follow either pattern of cluster
reduction, the ranking of constraints is the same: the markedness constraints
*Complex and *App-Left dominate, thereby ensuring that no clusters are
realised on the surface, at the expense of violating Max. The highly ranked
MaxHead will determine which consonant from the input survives. (As our
focus is now on onsets, we abbreviate MaxHead(Ons) as MaxHead for
convenience.)
As discussed earlier, children who follow the sonority pattern equate headed-
ness with low sonority. Headedness is thus correctly assigned to the stop in (23),
in spite of the fact that input obstruent+liquid clusters contain no knowledge
of branching structure.
136 Heather Goad and Yvan Rose
For children who follow the head pattern, by contrast, inputs are adult-like and
headedness is thus structurally determined, as can be seen from the input in
(24).
In the list of candidates in (23) and (24), those which preserve the input
cluster, (23a) and (24a), fail on the markedness constraint *Complex.26 The
decision between the remaining two candidates rests with MaxHead. While
the inputs for children who follow the sonority versus head patterns differ, in
both cases, the head of the cluster is the obstruent. Thus, MaxHead selects
candidate (b) in both (23) and (24). In candidate (c), the input head has been
deleted in favour of the liquid.
In the interest of space, we shall not provide a tableau for the adult grammar.
As mentioned earlier, the adult inputs are the same as those for the head-pattern
child. The two grammars differ in terms of their ranking. In the adult gram-
mar, the faithfulness constraints dominate the markedness constraints under
discussion with the result that the candidate with the branching onset, (24a), is
selected as optimal.
5.4.2 /S/+sonorant clusters While the same output is selected for both
head- and sonority-pattern children in obstruent+liquid and /S/+obstruent clus-
ters, this is not the case in /S/+sonorant clusters, as the structurally determined
head does not correspond to the segment that is least sonorous. Although the
correct head in clusters of this shape is the second consonant, children who
abide by the sonority pattern wrongly assign head status to /S/, as depicted in
the input representation in (25).
The input structure in (26), by contrast, reveals that children who follow the
head pattern have correctly assigned head status to the sonorant, on the basis of
the patterns of distribution to which they have been exposed. It is this difference
in headship in inputs that yields the two patterns of cluster reduction observed
on the surface.
The first candidate in each of (25) and (26) violates the undominated *App-
Left.27 Interestingly, (25a), which is the correct adult output, also violates
MaxHead. This would not be the case in the adult grammar, as inputs for
clusters of this shape have head status assigned to the liquid. (25b) and (25c) are
the remaining contenders for the sonority pattern. As the latter fatally violates
MaxHead, the form with liquid deletion is selected as optimal. The same
reasoning applies to the remaining two candidates for the head pattern in (26),
with one crucial difference: children who follow this pattern correctly assign
head status to the sonorant. Thus, MaxHead preserves this segment over /S/,
as in (26c).
In the adult grammar, inputs are identical to those posited by children who
follow the head pattern, as discussed earlier. Since both faithfulness constraints
dominate *App-Left, (26a) would be selected as optimal.
Acquisition of left-edge clusters in West Germanic 139
Our analysis of the head and sonority patterns has appealed to highly ar-
ticulated representations as well as to constraints that directly reference these
representations. It has not relied on a difference in constraint ranking across
the two stages. While our account of the sonority pattern, stage 1, was built on
the relationship between sonority and headedness, this was only true as concerns
the elaboration of inputs. We did not invoke a family of sonority-based con-
straints for the selection of outputs, because these constraints appear to play
no role in the head pattern. In the next section, we briefly consider an alter-
native analysis, one where the differences observed between the two patterns
of cluster reduction arise because of differences in ranking between sonority-
based constraints and MaxHead. We shall argue that such an analysis must be
rejected.
to heads, given that all linguistic constituents are headed, that heads have a
primacy not accorded to non-heads, and following from this, that non-heads
are dependent for their very existence on the presence of a head (see earlier
section 4).
Finally, as far as we have observed, the only widely attested pattern of cluster
reduction that appears not to rely on head preservation is the sonority pattern.28
Other possible patterns seem to be rare at best, for instance, the contiguity
pattern where the consonant which survives is the one contiguous with the input
vowel: obstruent+sonorant → sonorant, /S/+stop → stop, /S/+sonorant →
sonorant. Indeed, our investigations have turned up only two children whose
grammars follow this pattern (see also Bernhardt and Stemberger 1998: 388–
389). One is Subject 12, a language-delayed learner of English (from Chin and
Dinnsen 1992); the other is Marı́a, a normally developing learner of Spanish
(Lleó and Prinz 1996).29 The rarity of children whose cluster reduction patterns
respect contiguity is consistent with the analysis provided earlier in section 5.
As we proposed a head-based analysis of the sonority pattern, we reiterate our
point that heads have a primacy not granted to non-heads.
Before concluding our discussion of cluster reduction, one pattern remains to
be analysed: Annalena’s treatment of /S/+rhotic clusters. We turn to this topic
next.
Beyond the reduction patterns, additional support for the branching onset
analysis of Annalena’s /ʃ r/ comes from examining the point at which she ac-
quires the various cluster types. The table in (28) reveals that /ʃ r/ clusters emerge
as target-like at the same point as do obstruent+/r/ (Cr) clusters. The remaining
/ʃ /-initial clusters (/ʃ /+stop, /ʃ /+nasal, /ʃ l/, /ʃ v/) are not acquired until much
later.30
C1 Cr ʃr ʃ C (except ʃ r)
in (29), the faithful candidate, (a), violates *Complex. Candidate (b) vi-
olates a number of markedness constraints, notably AppLic(OnsHd/Pl),
because the head of the cluster, /r/, cannot support an appendix. In adult Ger-
man, this constraint must be lowly ranked to permit /ʃ r/ to be well-formed
as an appendix-initial cluster. Among the two remaining candidates, (c) is se-
lected as optimal in Annalena’s grammar, as it does not incur a violation of
MaxHead.
Turning to (30), in this case, Annalena has the correct adult input. AppLic
(OnsHd/Pl) is not violated by candidate (a) because the head of the cluster,
/l/, bears a Place node (cf. (10a)). Nevertheless, dominant *App-Left prevents
this form from surfacing in favour of candidate (d), the only other form where
MaxHead is respected.
Acquisition of left-edge clusters in West Germanic 143
a /sn/ i. [sn] *
ii. [n] * *!
b /sw m/ i. [sw m] * *!
ii. [s m] * *!
iii. [w m] *
From the limited data provided in Barlow and Gierut, Child 24 appears
to follow the head pattern at age 5;3.31 Importantly, then, (31) demonstrates
that we cannot reject an analysis of the head pattern which appeals to high-
ranking *S, rather than to MaxHead(Ons), merely by observing a child
who deletes /s/ only in clusters. We must therefore consider other evidence
to tease apart the two approaches. In the next section, we demonstrate that
evidence is available from Amahl’s grammar to support the MaxHead
approach.
a Branching /θ r/ [d,
◦
t] (8) 67 [r, l] (4) 33
onset /ʃ r/ [d]
◦
(2) 67 [ʁ ] (1) 33
a /θ r/ i. θ *!
ii. d◦ *
iii. r *!
b /ʃ r/ i. ʃ *!
ii. d◦ *
iii. r *!
c /sl/ i. s *!
ii. d◦ *!
iii. l *
a /sw/ i. s *!
ii. d◦ *
iii. w *!
a /θ r/ i. θ *!
ii. d◦ *
iii. r *! *
b /ʃ r/ i. ʃ *!
ii. d◦ *
iii. r *! *
c /sl/ i. s *!
ii. d◦ *!
iii. l *
d /sw/ i. s *!
ii. d◦ *!
iii. w *
8. Conclusion
In this chapter, we have discussed two patterns of reduction for left-edge clusters
that are observed in the acquisition of West Germanic languages. These patterns,
which we called the sonority and head patterns, differ in their treatment of rising
sonority /S/-initial clusters: in the sonority pattern, the least sonorous consonant
Acquisition of left-edge clusters in West Germanic 149
is retained in output forms (e.g., /Sl/ → [S]); in the head pattern, it is the sonorant
that survives (e.g., /Sl/ → [l]), that is, the consonant that is the head of the onset
in the target form.
We argued that the two patterns of cluster reduction are representative of
distinct stages in development that differ in the degree to which inputs are
elaborated, rather than differing in constraint ranking. We proposed that the
head pattern, stage 2 in development, can best be accounted for if highly struc-
tured target-like inputs are posited, while the sonority pattern, stage 1, arises
from inputs that are less articulated: only heads of sub-syllabic constituents are
specified. At stage 1, there is no knowledge of the structural relations that hold
across strings of consonants, and thus, the head of the onset can only be defined
on the basis of relative prominence. This led to incorrect head selection for ris-
ing sonority /S/-initial clusters, clusters where the head in the target grammar
is unexpectedly the segment of highest sonority.
Correct understanding of the head of a cluster requires that children take
account of the distributional evidence available in the ambient language. In
order to determine the type of evidence that is available, we systematically
examined the distributional facts for the three languages under investigation.
These facts support the view that all /S/-initial clusters, regardless of their
sonority profile, are represented with a left-edge appendix. We argued that the
sonority pattern child will be forced to take account of this evidence as s/he
attempts to prosodify all of the segments in a cluster, on the way to achieving
target-like inputs.
We demonstrated that a single ranking can be motivated for both patterns of
cluster reduction if a constraint requiring faithfulness to the head of the onset
constituent, MaxHead(Ons), is highly ranked. The markedness constraints
*Complex and *App-Left must be undominated, thereby ensuring that
no clusters are realised on the surface. As we argued that the inputs for the
two groups of children differ in terms of headedness, MaxHead(Ons) will
appropriately select the consonant that survives from the input cluster.
Finally, our analysis has relied on the premiss that children’s inputs are built
up through the course of acquisition until they reach the target stage when
inputs are fully prosodified. While we provided some evidence from Brazilian
Portuguese for fully prosodified inputs in adult grammars, the implications of
such a proposal have not been thoroughly explored. We leave this to future
research.
notes
* Earlier versions of this paper were presented at the 1st North American Phonol-
ogy Conference and at the 26th Annual Boston University Conference on Language
Development. We should like to thank the audiences for questions and comments.
We should also like to thank Angela Carpenter, Della Chambless, Suzanne Curtin,
150 Heather Goad and Yvan Rose
Janet Grijzenhout, John Matthews, Joe Pater, and Wim Zonneveld for comments that
greatly improved the chapter. This research was supported by grants from FCAR
and SFB (to H. Goad), from SSHRC (to G. L. Piggott and H. Goad), as well as
by a SSHRC postdoctoral fellowship (to Y. Rose). The authors can be reached at:
[email protected] and [email protected].
1. We use the symbol /S/ as a cover for segments which will be analysed as left-edge
appendices below. For the most part, /S/ corresponds to /s/ in English and Dutch and
to /ʃ / in German, the three languages under investigation. Further details are provided
in sections 2 and 4.
2. Smith states that /s/+sonorant clusters are exceptions to the pattern that the least
sonorous member is retained, as /s/ is always deleted ‘despite its inherent prominence’
(p. 166). In discussing the same type of data, Ingram (1989: 32) states that ‘deletion
of the marked member’ is the strategy most commonly observed in children’s cluster
reductions, but no definition of markedness is provided.
3. Henceforth, we use the term obstruent to refer to all obstruents with the exception
of /S/.
4. The label /S/+obstruent in (1) is used throughout the chapter instead of /S/+stop, as
Dutch permits /sχ / (see further section 4.2). (All three languages also contain /sf/,
but it is restricted to a handful of low-frequency loans.)
5. While Fikkert (1994) mentions the important role played by sonority in cluster re-
duction, only three of the nine children for whom she provides summaries in her
Appendix D (see below in the text) are listed in (1), as these are the only subjects
whose outputs unambiguously support the sonority pattern. As we shall see shortly,
the crucial difference between the sonority and head patterns rests with the treatment
of /S/+sonorant clusters. However, all of the Dutch children in Fikkert’s study treat
/sl/ and /z / in the same manner as fricative–initial clusters. (The same probably holds
true of English-speaking Joan for /sl/; see note 8.) Thus, the only potential difference
between the sonority and head patterns rests with the treatment of /S/+nasal clusters.
Fikkert points out (p. 92) that clusters of this shape are rarely attempted by Dutch
children. (A similar observation is made by Lohuis-Weber and Zonneveld 1996:
note 20.) Indeed, two of her subjects (Jarmo, Elke) never attempt clusters of this
shape and some avoid /S/+nasal until too late in development; thus, we cannot tell
from their outputs which reduction pattern was favoured. In section 5.1, we shall
provide an explanation for the treatment of /sl/ as fricative-initial.
6. Regarding the gap for /Sk/, Dutch – as well as German – has a number of /sk/-
initial loans (German no longer has /ʃ k/-initial words). Some of these loans are high
frequency and will no doubt be present in the early input to which children are
exposed. However, we have no information on their acquisition in either language,
and so they will not be considered further.
7. This assumes that fricatives are more sonorous than stops, a proposal which is sup-
ported on phonetic grounds but which has been challenged on phonological grounds
(see, e.g., Hall 1992 and Wiese 1996 on German). As will be seen in section 5.1, the
phonetic difference between the sonority value of stops and fricatives also plays a
role in our analysis of the construction of inputs by children who follow the sonority
pattern.
8. As alluded to in note 5, for Joan, definitive evidence for the head pattern is found in
/s/+nasal reductions only. (/sw/ undergoes fusion to [f], which is compatible with
both patterns of cluster reduction.) When /sl/ is attempted at 2;1, it is realised as [s],
Acquisition of left-edge clusters in West Germanic 151
not as [l]. However, target /l/ is realised as [z] at this stage; thus, we cannot be entirely
sure about whether output [s] from /sl/ is preservation of /s/ or whether it arises from
fusion of /s/ and /l/. For Naomi, evidence for the head pattern is also limited to
/ʃ /+nasal; she does not attempt /ʃ l/ (Janet Grijzenhout, personal communication).
9. [ b,
◦ ◦
d, ◦ ] represent voiceless unaspirated lenis stops in Amahl’s outputs (Smith 1973:
37). Overdots in Annalena’s forms indicate that a consonant is ambisyllabic (Elsen
1991: 10).
10. This implies that /Sr/ is illicit in Dutch and English. As concerns Dutch, /sχ r/ is
realised as [sr] by many speakers (see Waals 1999: 23 for acoustic evidence). If this
reflects a re-analysis of /sχ r/, then /Sr/ clusters may be present in this language as
well. However, unlike with German, we have no information on the acquisition of
[sr] in Dutch. We shall henceforth consider this cluster to be ill-formed in Dutch,
although nothing rests on this. As concerns English, [ʃ r] is not /S/-initial but is,
instead, analysed as a branching onset, as can be seen from its location in (2) and
(4). See sections 4.1 and 4.3 for further discussion.
11. This is in spite of Richness of the Base, which contends that all inputs are possible,
as the burden of selecting the correct output rests solely with constraint ranking
(Prince and Smolensky 1993). We agree with Hale and Reiss (1998) that Richness
of the Base is concerned with OT as a computational system, and not with real
language learners.
12. This is opposite to the view espoused by Gnanadesikan (this volume). She proposes
that the child’s inputs are specified for syllable structure, although she does not
accept that this holds true of the target grammar. The child first builds inputs on the
basis of adult outputs, that is, from forms which are fully prosodified. Throughout
the course of acquisition, these representations must be pruned back.
13. Two alternative options for /S/ at the left edge are as follows: (i) /S/+stop clusters
form single (complex) segments (e.g., Fudge 1969, Ewen 1982, Selkirk 1982, Van
de Weijer 1996, Wiese 1996); and (ii) /S/ is the coda of an empty-headed syllable
(Kaye 1992). We shall not address these options further.
14. The representation in (7a) has been provided as the unmarked structure for
obstruent+liquid clusters only. Nasal- and glide-final clusters are not included, in
spite of the fact that we listed /kn/ and /CW/ alongside what are indisputably branch-
ing onsets in section 2. In the case of /kn/, this was because Dutch and German
children who follow the head pattern of cluster reduction retain the stop, not the
nasal, consistent with the branching onset analysis. Further, Fikkert (1994) points
out that all of the Dutch children go through a stage where /kn/ clusters are produced
as [kl]. The same observation holds for German-speaking Annalena (Elsen 1991).
However, the issues that arise in the syllabification of /kn/ are complicated; we refer
the interested reader to Trommelen (1984) and Kager and Zonneveld (1985/86), who
treat Dutch /kn/ as appendix-initial (see also Van der Hulst 1984), and to Hall (1992),
Booij (1995), and Wiese (1996), who treat /kn/ as a branching onset. In the case
of /CW/, these clusters are branching onsets in the languages under consideration.
However, the possibility that this reflects the unmarked representation for clusters
of this shape has been challenged by Y. Rose (1999). Further, early French learners
treat obstruent+liquid clusters differently from obstruent+glide, casting doubt on
the branching onset structure as the unmarked analysis for obstruent+glide (see
Y. Rose 2000). In the interest of space, we do not consider /kn/ and /CW/ clusters
further.
152 Heather Goad and Yvan Rose
15. Beyond the fact that the coronality of /S/ makes it unmarked on the place dimension,
this restriction on the quality of appendices appears to be due to the fact that the
stridency of /S/ makes it audible in all contexts, even when adjacent to a stop.
16. Somewhat surprisingly, children do produce clusters of this type, e.g., Amahl’s
[tlæmp] ‘tramp’. Goad (1996) has argued that, in such cases, the constraint in (9)
has not been violated, as Coronal has not yet been projected for [l], evidence for
which comes from the behaviour of liquids in consonant harmony. Fikkert (1994:
120–122) provides similar examples from the acquisition of Dutch, e.g., Jarmo’s
[tl ε i] ‘train’.
17. The representations in (10) reveal that we assume feature geometry. Although this
is the case, it is not essential to the point being made. All that is required is for
the segments in (10a) to bear Coronal; exactly how this is formally expressed is
somewhat tangential.
18. Other arguments are available which, in the interest of space, we do not provide; on
English, for example, see Kaye (1992), which includes discussion of the psycho-
linguistic evidence obtained by Treiman, Gross, and Glavin (1992). Additional ar-
guments appeal to gaps in the expected inventory of clusters; if rising-sonority
/S/-initial clusters were branching onsets, the child would have to use indirect neg-
ative evidence to come to the conclusion that this analysis is unavailable for certain
combinations of segments, counter to expectation.
19. Throughout the rest of the chapter, segments abbreviate skeletal positions as well
as featural content.
20. This is somewhat of a simplification. In contrast to German, recall that English
and Dutch are marked in requiring the appendix in /S/-initial clusters to link to the
syllable node, rather than to the PWd (see section 4). Henceforth, we will ignore
this difference.
21. In the interest of space, rhyme nodes have been omitted from all structures.
22. One of Fikkert’s (1994) subjects, Eva, appears to contradict this claim. From
Fikkert’s Appendix D (p. 326), we observe that Eva’s /SN/ clusters initially follow the
head pattern, and later, the sonority pattern. Two things make us reluctant to conclude
that Eva’s data are problematic. First, recall from note 5 that Fikkert remarks that
/SN/ clusters are rarely attempted by Dutch-speaking children. We suspect that this is
true of Eva, as only one example for each pattern is provided by Fikkert (head pattern:
/snœyt/ →[nœys] ‘snout’ (1;6.1); sonority pattern: /snu pi / → [zu pi ] ‘Snoopy’
(1;9.8)). It is therefore difficult to get a sense of how representative these forms are
of Eva’s overall profile. Second, the output provided for the sonority pattern surfaces
with an initial [z] which could be due to fusion of /s/ and /n/. If this were the case,
then [zu pi ] would be compatible not only with the sonority pattern, but with the
head pattern of cluster reduction as well.
23. Joe Pater (personal communication) informed us that this holds of English-speaking
Trevor as well.
24. As was evident from our structures in (12a, 12b), we do not accept the coda as a
formal constituent (see esp. Kaye 1990); coda consonants are instead linked directly
to the rhyme. This is in part because we do not accept the existence of branching
codas, as would be expected if the coda were a licit argument of *Complex. Instead,
we adopt the view that final clusters which fall in sonority are heterosyllabic, with
the second consonant syllabified as the onset of an empty-headed syllable (on the
Acquisition of left-edge clusters in West Germanic 153
latter, see, e.g., Giegerich 1985, Kaye 1990, Piggott 1991, Rice 1992, Zonneveld
1993, Harris 1994; see Goad and Brannen 2003 on L1).
25. These proposals, including the one in (20), make different empirical claims which
will not be addressed here. The reader is referred to the original sources for further
details.
26. We have not provided candidates where [p] is syllabified as an initial appendix.
Such candidates will violate *App-Left, MaxHead, and other constraints which
assess the segmental content of left-edge appendices.
27. The branching onset candidate has not been provided for input /Sl/. This candidate
will violate a constraint such as S=App which requires that /S/ be optimally syl-
labified as an appendix. In the case of left-edge clusters, S=App will interact with
other constraints on syllabification, in particular Onset, to ensure that /S/ is only
syllabified as an appendix when followed by another consonant.
28. Here, we are concerned with children whose reduction patterns can be described by
reference to prosodic structure, linear position, or sonority. Beyond sonority, we do
not consider patterns which may be defined on the basis of the featural content of the
segments involved. For the latter, we draw the reader’s attention to Jongstra (2000).
29. Of the four Spanish learners investigated by Lleó and Prinz, one other child, Juan,
produced a relatively high number of outputs consistent with the contiguity pattern,
54 percent (cf. Marı́a’s 84 percent). (Note that the percentages provided by the
authors collapse C+liquid and C+glide clusters, even though glides are syllabified
in the nucleus in the adult grammar.) Finally, recall from (6a) that Spanish does not
tolerate /s/+C clusters at the left edge and, thus, the data include obstruent-initial
targets only.
30. Concerning Annalena’s relatively late acquisition of obstruent+/l/ (Cl) clusters, it is
worth pointing out that Van der Torre (2001) has provided cross-linguistic evidence
in favour of obstruent+/l/ branching onsets as more marked than obstruent+/r/.
Prior to 1;11.15, Annalena goes through a brief period of metathesis, ClV -> CVl.
Concerning 2;02.15 as the point of acquisition of ʃ C, there are a few cases of target-
like productions of these clusters before this time, starting at about 1;11; however,
systematic productions are not attested until 2;02.15.
31. The data provided in Barlow (1997) on Child 24’s other clusters (obstruent+liquid,
/s/+stop, /s/+nasal) reveal that this is not the case, but this is tangential to the point
being made here.
32. The output segments in (32) abstract away from consonant harmony which applies
in some of the data; however, one can always reconstruct what the target segments
are in such forms.
References
Alderete, J. (1995). Faithfulness to prosodic heads. Paper presented at The Derivational
Residue in Phonology Conference, Tilburg University.
Barlow, J. (1997). A Constraint-based Account of Syllable Onsets: Evidence from
Developing Systems. Ph.D. dissertation, Indiana University.
(1999). An argument for adjuncts: evidence from a phonologically disordered sys-
tem. In A. Greenhill, H. Littlefield, and C. Tano (eds.) Proceedings of the 23rd
154 Heather Goad and Yvan Rose
Bruce Hayes
1. Introduction
The study of phonological acquisition at the very earliest stages is making no-
table progress. Virtuosic experimental work accessing the linguistic knowledge
of infants has yielded extraordinary findings demonstrating the precocity of
some aspects of acquisition. Moreover, phonologists now possess an important
resource, Optimality Theory (OT) (Prince and Smolensky 1993), which per-
mits theorising to relate more closely to the findings of experimental work. The
purpose of this chapter is to outline one way in which these experimental and
theoretical research lines can be brought closer together. The central idea is that
current phonological theory can, without essential distortion, be assigned an
architecture that conforms closely to the process of acquisition as it is observed
in children. I conclude with a speculative, though reasonably comprehensive,
picture of how phonological acquisition might proceed.
2. Empirical focus
To avoid confusion, I will try to make clear that my view of what ‘phonological
acquisition’ involves may be broader than the reader is predisposed to expect.
When we study how very young children learn language, we can follow two
paths. One is to examine what children say; the other is to develop methods
that can determine what children understand or perceive. The reason these two
methods are so different is that (by universal consensus of researchers) acqui-
sition is always more advanced in the domain of perception than production:
children often cannot utter things that they are able to perceive and understand.
A fairly standard view of children’s productions (e.g., Smith 1973) is that the
internalised representations that guide children are fairly accurate,1 and that the
child carries out her own personal phonological mapping (Kiparsky and Menn
1977) which reduces the complex forms she has internalised to something that
can be more easily executed within her limited articulatory capacities. The study
of this mapping is a major research area. For literature review, see Gerken (1994:
792–799), and for some recent contributions Levelt (1994), Fikkert (1994),
158
Phonological acquisition in Optimality Theory 159
Pater (1997), Bernhardt and Stemberger (1998), Boersma (2000), and various
chapters in this volume.2
But it is also important to consider acquisition from another point of view,
focusing on the child’s internalised conception of the adult language. As just
noted, this will often be richer and more intricate than can be detected from
the child’s own speech. Indeed, the limiting case is the existence (see below)
of language-particular phonological knowledge in children who cannot say
anything at all. The focus of this chapter is the child’s conception of the adult
language, a research topic which can perhaps be fairly described as neglected
by phonologists.
To clarify what is meant here, consider the classic example of blick [bl k]
vs. *bnick [bn k] (Chomsky and Halle 1965). Speakers of English immediately
recognise that blick is non-existent but possible, whereas bnick is both non-
existent and ill-formed; it could not be a word of English. This is a purely
passive form of linguistic knowledge, and could in principle be learned by
an infant before she ever was able to talk. As we shall see shortly, there is
experimental evidence that this is more or less exactly what happens.
I advocate, then, a clear separation between the child’s phonological analy-
sis of the ambient language vs. her personal production phonology. This view
can be opposed, for example, to that of Smolensky (1996a), who takes the (a
priori, rather appealing) view that the child’s grammars for production and
perception are the same. I will argue that this cannot be right: children whose
production rankings generate very primitive outputs – or none at all – never-
theless can pass the ‘blick’ test. They could not do this unless they had also
internalised an adult-like constraint ranking, separate from their production
grammar.3
for (say) [i], the Guenther/Gjaja model would learn a perceptual magnet in this
location. The model mimics the behaviour of humans with respect to perceptual
magnets in a number of different ways.
As Kuhl (1995) has pointed out, a very appealing aspect of the ‘percep-
tual magnet’ concept is that it represents a form of information that can be
learned before any words are known. In any phonemic system, the phonetic
tokens of actual speech are distributed unevenly. By paying attention to these
asymmetries, and by processing them (perhaps in the way Guenther and Gjaja
suggest), the child can acquire what I will here call distributional protocate-
gories. These protocategories are not themselves phonemes, but as Kuhl points
out, they could in principle serve as discrete building-blocks for the later con-
struction of a true phonological system. Thus, for example, some distributional
protocategories may turn out to be only strongly differentiated allophones of
the same phoneme. These are only later united into a single category during the
next phase of learning, when the child discovers that the protocategories have a
predictable phonological distribution. The means by which this might be done
are explored below.
4. Phonological knowledge
To clarify this task, it will help to review received wisdom about what kinds
of phonological knowledge adult speakers possess. Note that we are speaking
here only of unconscious knowledge, deduced by the analyst from linguistic
behaviour and from experimental evidence. Overt, metalinguistic knowledge is
ignored here throughout.
There are basically three kinds of phonological knowledge. For each, I will
review how such knowledge is currently described formally in Optimality The-
ory (Prince and Smolensky 1993), the approach to phonology adopted here.6
4.1 Contrast
To start, phonological knowledge includes knowledge of the system of contrasts:
the speaker of French tacitly knows that [b] and [p], which differ minimally in
voicing, contrast in French; that is, they can distinguish words such as [bu] ‘end’
vs. [pu] ‘louse’. Korean also possesses [b] and [p], but the speaker of Korean
tacitly knows that they are contextually predictable variants. Specifically, as
shown by Jun (1996), [b] is the allophone of /p/ occurring between voiced
sounds when non-initial in the Accentual Phrase.
In Optimality Theory, knowledge of contrasts and their distribution is re-
flected in the language-specific rankings (prioritisations) of conflicting con-
straints. For example, in French the faithfulness constraint of the Ident fam-
ily that governs voicing outranks various markedness constraints that govern
the default distribution of voicing. This permits representations that differ in
voicing to arise in the output of the grammar. In Korean, the opposite ranking
holds, with Markedness over Faithfulness; thus even if Korean had underlying
forms that differed in voicing, the grammar would alter their voicing to the
phonological defaults; thus no contrast could ever occur in actual speech.7
In some cases, the situation is more complex than what was just described:
the ranking of constraints is such that a contrast is allowed only in particular
contexts. Thus, French generally allows for a voicing distinction in stops, but
there is a high-ranking markedness constraint that requires voicing agreement
in obstruent clusters. This constraint outranks Faithfulness for stop voicing,
so that the contrast is suspended in certain contexts. For instance, there is no
voicing contrast after an initial [s]; there are pairs like [bu] vs. [pu], but no pairs
like [spe sjal] (‘spéciale’) vs. *[spe sjal].
It will be important to bear in mind that in mainstream Optimality Theory,
constraint ranking is the only way that knowledge of contrast is grammatically
Phonological acquisition in Optimality Theory 163
4.3 Alternation
The third and remaining kind of phonological knowledge is knowledge of the
pattern of alternation: the differing realisations of the same morpheme in vari-
ous phonological contexts. To give a commonplace example, the plural ending
of English alternates: in neutral contexts it is realised as [z], as in cans [kænz];
but it is realised as [s] when it follows a voiceless consonant: caps [kæps].
The [s] realisation is related to the phonotactics in an important way: English
does not tolerate final sequences like [pz], in which a voiced obstruent follows a
voiceless one. For example, there are monomorphemic words like lapse [læps],
but no words like *[læpz].
Optimality Theory treats most alternations as the selection of an output can-
didate that deviates from the underlying form in order to conform to a phono-
tactic pattern. In this way, it establishes an especially close relationship between
phonotactics and alternation. Thus, for underlying /kæp+z /, the winning can-
didate is [kæps], in which the underlying value of [voice] for /z / is altered in
order to obey the markedness constraint that forbids final heterovoiced obstruent
clusters. We shall return to the connection between phonotactics and alternation
below.
164 Bruce Hayes
constraint that must be ranked lower in order for underlying forms to be altered
to fit the phonotactics. By way of contrast, earlier rule-based approaches require
the learner to find both structural description and change for every alternation,
with no help from phonotactic knowledge.
The ‘Wug’-testing study of Berko (1958) suggests that children actually do
make practical use of their phonotactic knowledge in learning alternations.
Among the various errors Berko’s young subjects made, errors that violate
English phonotactics, such as *[wgs] or *[gtʃ s] (Berko, pp. 162–163) were
quite rare. This observation was confirmed in more detail in the later work
of Baker and Derwing (1982). In the view adopted here, the children’s greater
reliability in this area results from their having already learned the phonological
constraints that ban the illegal sequences.
Summing up, it would appear that the OT answer to the conspiracy problem
is more than just a gain in analytical generality; it is the basis of a plausible
acquisition strategy.
of the constraints that will generate only winners. Thus, Constraint Demotion
has the ability to detect failed constraint sets.15 A constraint set also fails if it
is insufficiently rich, assigning identical violations to a winner and rival.
The Constraint Demotion algorithm is, in my opinion, an excellent contribu-
tion, which opens many avenues to the study of phonological learning. However,
it is not suited to the task of pure phonotactic learning, as I shall demonstrate.
The next few sections of the chapter are laid out as follows. In sections 7.2
and 7.3, I present a simple data example, ‘Pseudo-Korean’, to be used as an
illustration for the ranking algorithms. Section 7.4 applies Constraint Demotion
to Pseudo-Korean, and shows how it is unable to learn its phonotactics. Sections
7.5–7.7 lay out my own algorithm, and section 7.8 shows how it learns the
Pseudo-Korean pattern.
(4)
*[+voice][−voice][+voice] (abbreviation: *[+v][−v][+v])
This constraint bans voiceless segments surrounded by voiced ones. The
teleology of the constraint is presumably articulatory: forms that obey this
constraint need not execute the laryngeal gestures needed to turn off voicing
in a circumvoiced environment. For evidence bearing on this point from an
aerodynamic model, see Westbury and Keating (1986).
With these two constraints in hand, we may consider their role in Pseudo-
Korean. As will be seen, the faithfulness constraints for voicing are ranked
so low as to make no difference in Pseudo-Korean; therefore voicing is al-
lophonic. The distribution of voicing is thus determined by the ranking of
the markedness constraints. In particular, *[+v][−v][+v] must dominate
*[–son, +voice], so that obstruents will be voiced in voiced surroundings:
(5)
/ada/ *[+v][–v][+v] *[–son, +voice]
[ada] *
*[ata] *!
Voicing and aspiration are inherently not very compatible, and indeed most
languages lack voiced aspirates. Note that *d h bans a subset (a particularly
difficult subset) of the cases banned by *Aspiration.
In pseudo-Korean, *d h must be ranked above *[+v][–v][+v]; otherwise,
aspirated stops would be voiced intervocalically.
(8)
/ath a/ *d h *[+v][–v][+v]
[ath a] *
*[adh a] *!
172 Bruce Hayes
(11)
/th a/ Id (asp)/ V *Asp
[th a] *
*[ta] *!
(12)
*/ath / *Asp Id (asp)
[at] *
*[ath ] *!
Phonological acquisition in Optimality Theory 173
[ath a] *
*[ada] *!
*[adh a] *!
(17)
/ata/ *[+v][–v][+v] *[–son, +voice] Id(voice)/ V Id(voice)
[ada] * * *
*[ata] *!
*[+V][−V][+V] *ASP
*[−SON, +VOICE]
An adequate ranking algorithm must learn at least these rankings. It may also
without harm posit additional ranking relationships, so long as they do not
conflict with those of (18).
the simulation were legal surface forms of Pseudo-Korean; thus, the training
set provided only positive evidence.
The basis of the bad outcomes is not hard to see: since all the faithfulness
constraints are at the top of the hierarchy, it is always possible to generate an
output that is identical (or at least, very similar) to an illegal input. Moreover,
this is an inevitable result, given the nature of Constraint Demotion as applied
to learning data of the type considered here.
Recall now what we wanted our grammar to do: given a legal input, it should
simply reproduce it as an output; and given an illegal input, it should alter
it to form a legal output. It is evident that the ranking learned by Constraint
Demotion succeeds in the first task, but not the second.
7.7.1 Avoid Preference for Losers This principle is taken from classical
Constraint Demotion. It forbids the admission to the current stratum of con-
straints that prefer a losing to a winning candidate. It plainly must be an ‘undom-
inated’ ranking principle, since if one admits such a constraint to the current
stratum, the finished grammar will itself prefer losers and generate incorrect
outputs.
7.7.2 Favour Markedness Suppose that, after we have culled out the con-
straints that prefer losers, we are left with both faithfulness and markedness
constraints. In such cases, Low Faithfulness Constraint Demotion installs only
the markedness constraints in the new stratum. The faithfulness constraints must
await a later opportunity to be ranked, often the next stratum down. Only when
the eligible set consists entirely of faithfulness constraints may faithfulness
constraints be considered for ranking.
Here is the rationale: often a rival candidate can be ruled out either because
it violates a markedness constraint, or because it is unfaithful. In such cases,
we want the markedness constraint to do the job, because if we let faithfulness
do it, it is likely to lead to overgeneration in the finished grammar.
7.7.5 Favour Autonomy If the algorithm has gotten this far, it has culled
the set of rankable constraints down to a set of active faithfulness constraints,
none of which is a more general version of some more specific constraint. If this
set consists of just one member, it can be selected as the sole constraint of the
current stratum, and we can move on. But what if more than one faithfulness
constraint is still eligible? It would certainly be rash to install all of the eligible
constraints, because it is quite possible that we can ward off overgeneration by
installing just some of them.
My proposal is that there should a criterion of autonomy. As an approxima-
tion, we can say that for an eligible faithfulness constraint Fi to be installed in
the current stratum, it should exclude at least one rival R autonomously, acting
without help from any other constraint C. By ‘help’, I mean that C, working
alone, would likewise exclude R (in other words, R violates C more than the
winner). This is the core idea, and I will try to justify it below.
Before doing so, I must add some details. First, we must consider the possi-
bility that there might not be any constraints that exclude a rival without help.
In such cases, I propose that we should loosen the criterion, seeking constraints
that exclude some rival with the help of just one other constraint, or (failing
that) just two, and so on. In other words, autonomy is relative, not absolute.
Eventually, the criterion of relative autonomy will select a constraint or set of
constraints that can be installed in the new stratum.
180 Bruce Hayes
(25)
/ada/ Id (asp) / V *[+v][–v][+v] *asp *[–son/+vce] Id (vce) / V
[ada] *
*[ata] * *
From this pattern, it is evident that if we let Id (vce) / V join the next
stratum, *[ata] will be ruled out. But Id (vce) / V has a helper, namely
*[+v][−v][+v], so perhaps it is really *[+v][−v][+v] (yet unrankable)
that is ultimately responsible for ruling out *[ata]. We cannot know at this stage.
As it turns out in the end, it would be rash to pick Ident(voice) / V,
because in the correct grammar ((18) above), it emerges that *[+v][−v][+v]
must outrank Ident (voice) / V to avoid overgeneration.
By way of comparison, (26) shows the violation pattern for winner [th a] vs.
rival *[ta], at the same stage of learning:
(26)
Here, Ident (asp) / V rules out *[ta] alone, with no help from any other
eligible constraint. Its autonomy constitutes plain evidence that Ident(asp)
/ V needs to be ranked high. Favour Autonomy thus selects it for the stratum
under construction, which, as it turns out, permits the rest of the grammar to be
constructed straightforwardly (see section 7.8).
Summing up: Favour Autonomy, by seeking the examples where a constraint
acts with the fewest possible helpers, singles out the cases where a constraint
is most clearly shown to require a high ranking. By only allowing maximally
autonomous faithfulness constraints to be installed, we can limit installation to
cases that are most likely to be truly necessary, letting the less autonomous con-
straints sink further down to a point where they will not lead to overgeneration.
Favor Autonomy, being relatively defined, will always place at least one
constraint in the current stratum. Thus, when invoked, it is the last criterion
used in stratum formation. Once the stratum is created, Low Faithfulness Con-
straint Demotion continues just like classical Constraint Demotion: rival forms
explained by the constraints of the new stratum are culled from the learning set,
and the next stratum down is created by invoking the criterion hierarchy again,
beginning with Avoid Preference for Losers.
182 Bruce Hayes
autonomy for these constraints was already illustrated in (25) and (26), which
show the fewest-helper cases for each. Since Ident(asp) / V is the
more autonomous (zero helpers to one), it is selected as the sole member of
Stratum 2. The rival candidates that it explains, such as *[ta] for /th a/, are
excluded from the learning set.
Stratum 3: We first evaluate Avoid Preference for Losers: *[–son/
+voice] still prefers *[ata] for underlying /ada/, and so must remain un-
ranked. But two other markedness constraints now prefer no losers among the
remaining data. One such constraint is *[+v][−v][+v]. Earlier it preferred
*[ada] for /ath a/, but that form is now explained by Ident(asp) / V. There-
fore, Avoid Preference for Losers permits *[+v][−v][+v] to join the new
stratum. Likewise, *Asp no longer prefers losers (*[ta] for /th a/ is now ruled
out by Ident(asp) / V), so it too may join the third stratum. The remain-
ing constraints are faithfulness constraints, and they are shut out by Favour
Markedness.
Once *[+v][−v][+v] is placed in Stratum 3, then *[ata] for /ada/ is ex-
plained and excluded from the learning set. This vindicates the earlier decision
((25)) not to place Id(vce) / V in Stratum 2.
Stratum 4: Avoid Preference for Losers: *[–son/+voice] no longer
prefers a loser, since *[ata] for [ada] is now ruled out by *[+v][−v][+v].
Thus *[–son/+voice] may now be ranked in the fourth stratum. Since
*[−son/+voice] is a markedness constraint, Favour Markedness contin-
ues to shut out the remaining faithfulness constraints.
Stratum 5: By this stage, all the rival candidates are excluded by some
ranked constraint. The remaining three constraints (Id(asp), Id(vce) / V,
and Id(vce)) prefer no losers, so are passed first to the jurisdiction of Favour
Markedness, then to Favour Activeness. The latter, noticing that none of
these constraints is active, invokes its special termination provision and dumps
them all into the bottom stratum. The procedure that checks for termination
(section 7.7.6) notices that all constraints are ranked and that all rival candi-
dates are ruled out, so it records success.
Summing up, the ranking obtained by Low Faithfulness Constraint Demotion
for Pseudo-Korean is as follows:
Comparing this with (18), it can be seen that all of the crucial rankings are
reflected in the stratal assignments. The distinction of Strata 1 and 2 (imposed
by Favour Markedness) does not reflect a crucial ranking, but no harmful
consequences result from ranking *d h over Ident(aspiration) / V .21
By inspection, the grammar of (29) reveals what is phonemic in Pseudo-
Korean stops: aspiration in prevocalic position. This is because Ident(asp)
/ V is the only faithfulness constraint that doesn’t reside at the bottom of
the grammar. Voicing is allophonic, and aspiration in non-prevocalic position
is likewise predictable.
As a check on the finished grammar, we can test to see if it overgenerates.
To do this, I fed the grammar the larger set of inputs which had earlier shown
that regular Constraint Demotion overgenerates. From these inputs, the new
grammar derived the following outputs:
(30) Well-formed inputs Ill-formed inputs
input output input output
/ta/ [ta] /da/ [ta]
/ada/ [ada] /dh a/ [th a]
/th a/ [th a] /ata/ [ada]
h
/at a/ [ath a] /adh a/ [ath a]
/at/ [at] /ad/ [at]
/tada/ [tada] /ath / [at]
/tath a/ [tath a] /adh / [at]
/th ada/ [th ada]
/th ath a/ [th ath a]
/tat/ [tat]
/th at/ [th at]
Specifically, all well-formed inputs are retained, and all ill-formed inputs are
‘fixed’; that is, converted by the grammar into a well-formed output.22 Thus,
Low Faithfulness Constraint Demotion succeeded in learning a ranking that
defines Pseudo-Korean phonotactics, based on only positive evidence.
I have tried out Low Faithfulness Constraint Demotion on a number of data
files similar in scope to Pseudo-Korean.23 In these data files, it succeeds in
producing ‘tight’ grammars, which generate only forms that match (or are less
marked than)24 those given to it in the input.
7.9 Caveats
To recapitulate: infants are apparently able to learn a great deal about the phonol-
ogy of the ambient language – specifically, the phonotactics – in the absence
of negative evidence. Moreover, they most likely accomplish this learning with
Phonological acquisition in Optimality Theory 185
7.9.1 Gradient well-formedness The algorithm cannot deal with the fact
that judgements of phonotactic well-formedness are gradient (Algeo 1978); for
example, a form like ?[dw ε f] seems neither perfectly right nor completely ill-
formed. Moreover, such patterns appear to be learned by infants. An example
is the unusual status of final stress in English polysyllabic words, e.g., ballóon.
Jusczyk, Cutler, and Redanz (1993) give evidence that this pattern is learned
by infants at the same time they are learning the exceptionless patterns.
There is an algorithm that has proved capable of treating gradient well-
formedness, namely the Gradual Learning Algorithm of Boersma (1997),
applied to gradient well-formedness in Boersma and Hayes (2001). I have not
yet succeeded in incorporating a suitable downward bias for Faithfulness into
this algorithm.
8.2.3 A stage of vulnerability If the view taken here is correct, then children
often go through a stage of innocent delusion: they wrongly believe that certain
phones which are lawfully distributed according to a grammatical environment
are separate phonemes. The effects of this errorful stage can be seen, I think,
in cases where the erroneous belief is accidentally cemented in place by the
effects of dialect borrowing.
Consider the varieties of American English noted above in which writer
[ rɾ] and rider [ raɾ] form a minimal pair. As just mentioned, they can be
190 Bruce Hayes
(1999: 20–22, this volume) and presented by them as a problem for Biased
Constraint Demotion; and (3) a simulation of the legal vowel sequences of
Turkish: rounding may occur on noninitial vowels only when they are high and
preceded by a rounded vowel in the previous syllable. Since detailed discussion
of these simulations would exceed the space available here, I refer the interested
reader instead to a Web page from which full descriptions of the simulations
can be downloaded (http://www.linguistics.ucla.edu/people/hayes/acquisition).
The result of all three simulations was that Low Faithfulness Constraint Demo-
tion learned the correct (i.e., non-overgenerating) grammar, and Biased Con-
straint Demotion failed to learn this grammar; that is, learned a less-restrictive
ranking.
This difference is less than meets the eye, however, because Low Faithfulness
Constraint Demotion makes use of a ranking principle that Prince and Tesar de-
liberately avoid, namely Favour Specificity (section 7.7.4). Favour Specificity
could easily be incorporated into Biased Constraint Demotion. In fact, I have
experimented with doing this, and I find that when Biased Constraint Demotion
is thus modified, it learns all three simulations correctly.
My simulations suggest a diagnosis of the problem that is faced by Biased
Constraint Demotion when Favour Specificity is not included: general faith-
fulness constraints, because they are general, tend to free up more markedness
constraints down the line. This problem was also noticed by Prince and Tesar
(1999: note 9, this volume).
Given that both algorithms benefit from Favour Specificity, it is worth consid-
ering Prince and Tesar’s objections to it. The crucial point (see their discussion
of the ‘ pa to’ language; 1999: 19–20, this volume) is that we cannot always
read special-to-general relations off the structural descriptions of the constraints
themselves (i.e., a superset structural description singles out a subset of forms).
In some cases, the special–general relation is established only when we consider
the action of higher ranked constraints. Thus, in the ‘ pa to’ language, where
all initial syllables, and a number of other syllables as well, are stressed, the
environment ‘in initial syllables’ is special relative to the general environment
‘in stressed syllables’. Favour Specificity cannot be applied here, given that the
contingent special–general pattern is not yet known.
The answer to this problem, I suggest, may lie in having the learning algo-
rithm work harder. Contingent subset relations among faithfulness constraints
are, I believe, learnable from the input data even where the constraint ranking is
yet unknown, using computations whose scope would be extensive but hardly
astronomical. In particular, if an algorithm kept track of the comparative ap-
plicability of the faithfulness constraint environments to the available learning
data, it would emerge clearly after only a few thousand words that the structural
description ‘initial syllable’ in the ‘ pa to’ language never targets material not
also targeted by the structural description ‘stressed syllable’.31
194 Bruce Hayes
data sets, in which every rival candidate could arise as an error from a partial
grammar. Every simulation came out identically.
It remains an issue for the long term whether the use of rival candidates that
are not in the error-driven set aids – or perhaps hinders – phonotactic learning.
notes
∗ For advice that helped to improve this chapter I would like to thank Adam Albright,
Sun-Ah Jun, René Kager, Patricia Keating, Alan Prince, Charles Reiss, Donca Steri-
ade, Bruce Tesar, Bernard Tranel, Colin Wilson, and audience members at Utrecht,
San Diego, Berkeley, and Johns Hopkins. All are absolved from any remaining short-
comings. Prince and Tesar (1999/this volume) was composed simultaneously with,
but independently from, the present chapter. The authors present a proposal about
phonotactic learning that is quite similar to what is proposed here. In clarifying and
focusing the original version of this paper (Hayes 1999c) for the present volume I
have benefited considerably from reading Prince and Tesar’s work. For a comparison
of the ranking algorithms proposed in the two papers, see Appendix A.
1. With important exceptions; see for instance Macken (1980), the literature review
in Vihman (1996: ch. 7), and Pater (this volume). What is crucial here is only that
perception has a wide lead over production.
2. Hale and Reiss (1998) likewise propose that the child’s output mapping is separate
from her phonological system per se. However, they go further in claiming that the
child’s mapping is utterly haphazard, indeed the result of the child’s ‘body’ rather
than her ‘mind’. I cannot agree with this view, which strikes me as an extraordinary
denigration of research in child phonology. To respond to two specific contentions:
196 Bruce Hayes
(1) The free variation and near-neutralisations seen in the child’s output (Hale and
Reiss 1998: 669) are common in adult phonology, too. Whatever is developed
as a suitable account of these phenomena (and progress is being made) is likely
to yield insight into children’s phonology as well.
(2) Claimed differences between children’s constraints and adults’ (see Hale and
Reiss 1998 (18a)) can be understood once we see constraints (or at least, many of
them) as grammaticised principles that address phonetic problems. Since chil-
dren employ different articulatory strategies (such as favouring jaw movement
over articulator movement), they develop different (but overall, rather similar)
constraint inventories.
3. Separating the child’s internalised conception of the adult grammar from her produc-
tion grammar also helps to clarify various patterns in the child phonology literature.
For instance, before age 4;3 Gwendolyn Stemberger produced /ni d+d/ as [ni də d]
‘needed’, but /h +d/ as [hdd] ‘hugged’ (Bernhardt and Stemberger 1998: 651).
This mapping makes sense if /ni d+d/ → [ni də d] was part of Gwendolyn’s concep-
tion of adult phonology, but /h +d/ → [hdd] was part of her production mapping
from adult outputs to Gwendolyn-outputs. A similar example appears in Dinnsen
et al. (2000).
4. Best et al. (1988) have shown that English-learning infants do not have difficulty
in discriminating a click contrast of Zulu. This is probably unsurprising, given that
adult monolinguals can also discriminate contrasts that are not phonemic for them
when the phonetic cues are extremely salient. A further relevant factor is that English
has no existing phonemes that could be confused with clicks and would distort their
perception.
5. Examples from Jusczyk et al. (1993): Dutch *[rtum], English ?[ji d]. Many of
the sequences used in Jusczyk et al.’s experiment violate formalisable phonotactic
restrictions that are exceptionless in English; the others are sufficiently rare that
they could in principle be describable as ill-formed, from the point of view of the
restricted data available to the infant.
6. For reasons of space, I cannot provide a summary of Optimality Theory (OT), now
the common currency of a great deal of phonological research. A clear and thoughtful
introduction is provided in the textbook of Kager (1999b). Another helpful account,
oriented to acquisition issues, is included in Bernhardt and Stemberger (1998).
7. And, in fact, it is plausible to suppose that Korean learners would never uselessly
internalise underlying representations with contrastive voicing, since the distinction
could never be realised.
8. Evidence for rather impressive ability among infants to extract and remember words
and phrases from the speech stream is presented in Jusczyk and Aslin (1995), Jusczyk
and Hohne (1997), and Gomez and Gerken (1999).
9. The production phonologies of toddlers, mapping adult surface forms to simplified
child outputs, are also conspiratorial, as has been pointed out forcefully by Menn
(1983). This is a major rationale for current efforts to use Optimality Theory to
model these production phonologies.
10. Phonological alternations provide a weak form of negative evidence: the fact that
the [-z] suffix of cans [kænz] shows up altered to [-s] in caps [kæps] is a clue that
final *[pz] is not legal in English. It is for this reason that constraint ranking is
often an easier problem for alternation data than for purely phonotactic data. Given
Phonological acquisition in Optimality Theory 197
that alternations can provide negative evidence, it is relevant that the learning of
alternations appears to happen relatively late (section 5); and also that in many
languages only a fraction of the phonotactic principles are supported by evidence
from alternations.
11. The reader should not scoff at the idea of a grammar being required to rule out
hypothetical illegal forms. To the contrary, I think such ability is quite crucial.
The real-life connection is speech perception: given the characteristic unclarity and
ambiguity of the acoustic input, it is very likely that the human speech perception
apparatus considers large numbers of possibilities for what it is hearing. To the
extent that some of these possibilities are phonotactically impossible, they can be
ruled out even before the hard work of searching the lexicon for a good match is
undertaken.
12. Plainly, there is a potential debt to pay here when we consider languages that have
elaborate systems of alternation at the phrasal level; for example, Kivunjo Chaga
(McHugh 1986) or Toba Batak (Hayes 1986). Here, one strategy that might work
well would be for the child to focus on one-word utterances, where the effects of
phrasal phonology would be at a minimum. Another possibility is for the child to
internalise a supply of short phrases, and learn their phonotactics without necessarily
parsing them.
13. I am grateful to Daniel Albro for suggesting this as a basis for pure phonotactic
learning.
14. In work not described here, I have examined the more sophisticated Error Driven
Constraint Demotion variant of the algorithm, obtaining essentially the same results
as are described below for the batch version.
15. Or, in some cases, to correct assumptions made earlier about ‘hidden structure’ in
the input data; Tesar and Smolensky (2000).
16. There is a current open research issue in OT: whether contextual information prop-
erly belongs within the markedness constraints or the faithfulness constraints. For
useful argumentation on this point, see Zoll (1998). The account here places con-
texts primarily in the faithfulness constraints; I have also tried a parallel simulation
using the opposite strategy, and obtained very similar results.
17. More accurately: sonorants, but for Pseudo-Korean I will stick with vowels for
brevity.
18. The algorithm given here is the same as the one in the earlier Web-posted version
of this article (Hayes 1999c). However, the presentation has been substantially re-
shaped, taking a lesson from the presentation of similar material in Prince and Tesar
(this volume). In addition to increasing clarity, the changes should also facilitate
comparison of the two algorithms.
19. When the constraint set is inadequate (cannot derive the winners under any ranking),
the dumping of the inactive faithfulness constraints into a default stratum will be
the penultimate phase. In the final phase, the algorithm learns that it has only loser-
preferring markedness constraints to work with, and is unable to form any further
strata. It thus terminates without yielding a working grammar (cf. section 7.1, 7.7.6).
20. Markedness constraints already assigned to a stratum can never be helpers in any
event: the rivals they explain have been culled from the learning set.
21. This is because the completed grammar maps underlying /dh a/ to [th a], so that both
*d h and Ident(aspiration) / V are satisfied.
198 Bruce Hayes
22. The reader may have noted that the ‘fixes’ imposed by the grammar conform to the
behaviour of alternating forms in real Korean. This outcome is accidental. A larger
Pseudo-Korean simulation, not reported here, included candidates with deletion
and insertion, and indeed uncovered grammars in which illegal forms were repaired
by vowel epenthesis and consonant deletion, rather than by alteration of laryngeal
feature values. For further discussion, see section 8.1.
23. Specifically: a file with the legal vowel sequences of (the native vocabulary of)
Turkish, a file embodying the azba problem of Prince and Tesar (1999/this volume),
and a family of files containing schematic ‘CV’ languages of the familiar type,
banning codas, requiring onsets, banning hiatus, and so on. See Appendix A for
further discussion.
24. Thus, for instance, when given a (rather unrealistic) input set consisting solely of
[CV.V], the algorithm arrived at the view that [CV.CV] is also well-formed. This
is because, given the constraints that were used, there was no ranking available that
would permit [CV.V] but rule out [CV.CV]. This fits in with a general prediction
made by Optimality Theory, not just Low Faithfulness Constraint Demotion: in any
language, a hypothetical form that incurs a subset of the Markedness violations of
any actual form should be well-formed.
25. Moreover, for purposes of such a proof, one would have to state precisely what
is meant by ‘effective’ in the context of pure phonotactic learning. One possible
definition is the subset definition adopted by Prince and Tesar (this volume): a
ranking R is the most effective if there is no other ranking R that covers the input
data and permits only a subset of the forms permitted by R. However, in the long run
I think our main interest should lie in an empirical criterion: a phonotactic learning
algorithm should make it possible to mimic precisely the well-formedness intuitions
of human speakers.
26. The reader who doubts this might further consider the effects of forms that are
not morphologically transparent. A child exposed to the children’s book character
‘Lowly [ loυ li] Worm’ will not necessarily be aware that most worms live under-
ground (Lowly doesn’t); and will take Lowly to form a near-minimal pair with, e.g.,
roly-poly [ roli poli]. Lack of morphological knowledge is particularly likely for
infants, who probably learn phonology in part on the basis of stored phonological
strings whose meaning is unknown to them (for evidence, see Appendix C).
27. Note that innovation of grammatically conditioned allophones probably arises his-
torically from the same effects seen synchronically in Marina. Had Marina been able
to transmit her innovation to the speech community as a whole, then Modern Greek
would have come to have [x] and [ç] as grammatically conditioned allophones.
28. An issue not addressed in the text is which of the child’s two emerging grammars
contains output-to-output correspondence constraints – is it her own production
phonology, or her conception of the adult system? If the discussion above is right,
they must occur at least in the internalised adult system, though perhaps they occur
in the production system as well. The crucial empirical issue, not yet investigated
to my knowledge, is this: do children like Marina and Gwendolyn recognise that
forms like *[ exete] or *[ s tŋ] would be aberrant coming from adults?
29. This appendix has benefited greatly from advice I have received from Colin Wilson;
he is absolved, however, of responsibility for errors. For Wilson’s own new proposals
in this area, see Wilson (2001).
Phonological acquisition in Optimality Theory 199
30. Testing was greatly facilitated by software code for Biased Constraint Demotion
provided by Bruce Tesar as a contribution to the ‘OTSoft’ constraint ranking software
package (http://www.linguistics.ucla.edu/people/hayes/otsoft).
31. Method: form an n by n chart, where n is the number of contexts employed in
faithfulness constraints, and enter a mark in row i and column j whenever structural
material in some word is included in context i but not context j. After sufficient data
have been processed, any blank cell in row p, column q, together with a marked cell
in row q, column p, indicates that context p is in a special-to-general relationship
with context q. It can be further noted that paired blank cells in row p, column
q and row q, column p reveal contexts that – in the target language – turn out
to be identical. This information would also be very helpful to Low Faithfulness
Constraint Demotion. Accidentally duplicate constraints would be likely to distort
the crucial measure ‘number of helpers’ on which Favour Autonomy depends. An
environment-tabulation procedure of the type just described could be used to ward
off this problem.
32. Infinite candidate sets arise as the result of epenthesis; thus for underlying /sta/,
GEN could in principle provide [ə sta], [əə sta], [əəə sta], etc.
33. Jusczyk and Hohne (1997) show that 8-month-old infants recall words they heard
in tape-recorded stories when tested in a laboratory two weeks later. Since only a
small sampling of the words in the stories was tested, the number memorised by
the infants must be far larger. Gomez and Gerken (1998) supplement this result
by demonstrating the great speed and facility with which infants can internalise
potential words: 12-month-olds, on very brief exposure, can rapidly assimilate and
remember a set of novel nonsense words, well enough to use them in determining
the grammatical principles governing the legal strings of an invented language.
References
Aksu-Koç, A. A. and D. I. Slobin (1985). The acquisition of Turkish. In D. Slobin (ed.)
The Crosslinguistic Study of Language Acquisition, Vol. 1: the Data. Hillsdale,
N.J.: Lawrence Erlbaum Associates.
Albright, A. and B. Hayes (1998). An automated learner for phonology and morphology.
MS., Department of Linguistics, UCLA.
[http://www.linguistics.ucla.edu/people/hayes/learning]
Albro, D. (1997). Evaluation, implementation, and extension of Primitive Optimality
Theory. M.A. thesis, UCLA.
[http://www.linguistics.ucla.edu/people/grads/albro/papers.html]
Algeo, J. (1978). What consonant clusters are possible? Word 29. 206–224.
Baker, W. J. and B. L. Derwing (1982). Response coincidence analysis as evidence for
language acquisition strategies. Applied Psycholinguistics 3. 193–221.
Beckman, J. (1998). Positional Faithfulness. Ph.D. dissertation, University of
Massachusetts. [ROA 234, http://roa.rutgers.edu]
Benua, L. (1997). Transderivational Identity: Phonological Relations Between Words.
Ph.D. dissertation, University of Massachusetts.
[ROA 259, http://roa.rutgers.edu]
Berko, J. (1958). The child’s learning of English morphology. Word 14. 150–177.
Berman, R. A. (1985). The acquisition of Hebrew. In D. Slobin (ed.) The Crosslinguistic
200 Bruce Hayes
Study of Language Acquisition, Vol. 1: the Data. Hillsdale, N.J.: Lawrence Erlbaum
Associates.
Bernhardt, B. H. and J. P. Stemberger (1998). Handbook of Phonological Develop-
ment from the Perspective of Constraint-Based Nonlinear Phonology. San Diego:
Academic Press.
Best, C. T., G. W. McRoberts, and N. M. Sithole (1988). Examination of the perceptual
re-organization for speech contrasts: Zulu click discrimination by English-speaking
adults and infants. Journal of Experimental Psychology: Human Perception and
Performance 14. 345–360.
Boersma, P. (1997). How we learn variation, optionality, and probability. Proceedings
of the Institute of Phonetic Sciences of the University of Amsterdam 21. 43–58.
(1998). Functional Phonology. The Hague: Holland Academic Graphics.
(2000). Learning a grammar in Functional Phonology. In Dekkers et al. (2000). 465–
523.
Boersma, P. and B. Hayes (2001). Empirical tests of the Gradual Learning Algorithm.
LI 32. 45–86.
Burzio, L. (1998). Multiple correspondence. Lingua 103. 79–109.
Chomsky, N. (1964). Current issues in linguistic theory. In J. A. Fodor and J. J. Katz
(eds.) The Structure of Language. Englewood Cliffs, N.J.: Prentice-Hall.
Chomsky, N. and M. Halle (1965). Some controversial questions in phonological theory.
JL 1. 97–138.
Dekkers, J., F. van der Leeuw, and J. van de Weijer (eds.) (2000). Optimality Theory:
Phonology, Syntax, and Acquisition. Oxford: Oxford University Press.
Derwing, B. L. and W. J. Baker (1986). Assessing morphological development. In P.
Fletcher and M. Garman (eds.) Language Acquisition: Studies in First Language
Development. Cambridge: Cambridge University Press.
Dinnsen, D. A., L. W. McGarrity, K. O’Connor, and K. Swanson (2000). On the role of
sympathy in acquisition. Language Acquisition 8. 321–361.
Eimas, P. D., E. R. Siqueland, P. Jusczyk, and J. Vigorito (1971). Speech perception in
infants. Science 171. 303–306.
Eisner, J. (1997). Efficient generation in primitive Optimality Theory. In Proceedings of
the 35th Annual Meeting of the Association for Computational Linguistics, Madrid.
[ROA 259, http://roa.rutgers.edu]
Ellison, M. (1994). Phonological derivation in Optimality Theory. In COLING, Vol. 2.
1007–1013, Kyoto. [ROA 75, http://roa.rutgers.edu]
Fikkert, P. (1994). On the Acquisition of Prosodic Structure. Leiden and Amsterdam:
Holland Institute of Generative Linguistics.
Fodor, J. A., T. G. Bever, and M. F. Garrett (1974). The Psychology of Language. New
York: McGraw Hill.
Friederici, A. D. and J. E. Wessels (1993). Phonotactic knowledge of word boundaries
and its use in infant speech perception. Perception and Psychophysics 54. 287–
295.
Gerken, L. (1994). Child phonology: past research, present questions, future directions.
In M.A. Gernsbacher (ed.) Handbook of Psycholinguistics. San Diego: Academic
Press.
Gomez, R. L. and L. Gerken (1999). Artificial grammar learning by 1-year-olds leads
to specific and abstract knowledge. Cognition 70. 109–135.
Phonological acquisition in Optimality Theory 201
1. Introduction
In this chapter we consider syllable types in acquisition, language typology,
and also in a third dimension, namely production frequency in the language
surrounding the language learner.
In Optimality Theory (OT) (Prince and Smolensky 1993) both language ac-
quisition and language typology can be accommodated. The basic assumption
is that constraints are universal, but that the rankings of these constraints are
language particular. For language typology the idea is that different rankings
reflect different (possible) languages. For acquisition the idea is that the learner
needs to acquire the language-specific ranking of his mother tongue. The as-
sumption here, like in most other work on acquisition to date (Gnanadesikan this
volume, Hayes this volume, Davidson et al. this volume, Prince and Tesar this
volume), is that structural constraints initially outrank faithfulness constraints.
The grammar in this state prefers structurally unmarked outputs to faithful ones.
By promoting faithfulness constraints in the ranking, or by demoting structural
constraints, the outputs can become more marked and more faithful to their
inputs.1
What is the expected relation between language typology and language ac-
quisition? Concentrating here on syllable types, languages can be structurally
marked or unmarked with respect to the structural constraints that refer to sylla-
ble type: Onset, No-Coda, *Complex-Onset, and *Complex-Coda.
A language is structurally unmarked with respect to a structural constraint when
such a constraint dominates faithfulness constraints, and it is marked when
such a constraint is dominated by faithfulness constraints. For language ac-
quisition the assumption is that the child’s output is initially structurally totally
unmarked. All structural constraints dominate all faithfulness constraints. If this
initial state of the grammar is equal to the final state grammar of the language
to be learned, no further developmental steps need to take place. However, if
the language to be learned is marked in one or more ways, the learner needs
to acquire the appropriate grammar. The assumption is that this is done by
promoting faithfulness constraints to positions that outrank specific structural
204
Syllable types in cross-linguistic and developmental grammars 205
constraints. This will lead to outputs that can be marked with respect to the
outranked structural constraint. When the language to be learned is marked in
several respects, it can be hypothesised that the learner acquires these marked
aspects of the grammar gradually. That is, there could be a learning path where
the learner, in going from the initial state of the grammar, Ginitial , to the final
state of the grammar, Gfinal , passes through several intermediate grammars. To
combine language acquisition with language typology, the expectation is that
the intermediate grammars of the language learner are also final state grammars
of languages of the world. Vice versa, it is expected that since languages can
be marked in different respects, there are different possible learning paths the
learner of a very marked language can take to reach the final state. For example,
as we shall see below, there are languages that can violate No-Coda but not
Onset, like Thargari; and there are languages that can violate Onset but not
No-Coda, like Cayuvava. A learner who has to acquire a language that can
violate both No-Coda and Onset, like Mokilese, can either get to the G f inal
of that language through an intermediate Thargari stage, or through an inter-
mediate Cayuvava stage. In its strongest form the hypothesis is thus that there
is a 1:1 relation between grammars of languages in the world and intermediate
grammars in language acquisition.
In order to check these assumptions, we have combined data on cross-
linguistic variation in syllable types (Blevins 1995) with data on the acquisition
of syllable types (Levelt et al. 2000). In the remainder of this chapter we shall
first present the OT grammars of languages of the world, to which we refer,
slightly off the mark, as ‘cross-linguistic grammars’, that were deduced from
data in Blevins. Then we shall proceed to the developmental grammars dis-
cussed in Levelt et al. (2000), and the two sets of grammars will be lined up.
The similarities and differences will be discussed, and, finally, a solution for the
particular nature of the learning path for Dutch children is presented in terms
of syllable type frequencies in the input.
As can be seen in (1), languages allow syllable types with different degrees of
complexity. The language Hua has only one syllable type, namely CV, while
Dutch, like English, on the other end, allows a whole set of more complex
syllable types. The one syllable type that all languages have in common is CV,
and this type is regarded to be totally unmarked. In terms of markedness it can
thus be said that Hua is structurally the most unmarked language with respect
to syllable type, while Dutch is the most marked one.
In OT there are two main types of constraints. On the one hand, there
are structural constraints that demand outputs to be structurally unmarked;
while on the other hand, there are faithfulness constraints that demand out-
puts to be faithful to their inputs, whether these are structurally marked or
not. The ranking of structural constraints vis-à-vis faithfulness constraints
in a grammar determines the structural markedness allowed in a language.
When all structural constraints outrank all faithfulness constraints the lan-
guage is structurally totally unmarked; and when all faithfulness constraints
outrank all structural constraints the language allows outputs that are struc-
turally marked in any possible way. When faithfulness constraints are ranked
among structural constraints, the language allows for a certain degree of com-
plexity in output forms. The structural constraints that are relevant here are
in (2):
The constraints Onset and No-Coda are well known from the literature.
The more general constraint *Complex is split up into one constraint refer-
ring to onsets and one referring to codas (see Kager 1999 for a similar move).
This is necessary to differentiate between languages that allow complex onsets
but no complex codas and vice versa, and also to differentiate language learn-
ers who acquire complex onsets first from those who acquire complex codas
first.
In (3), then, are the different rankings of the structural constraints from (2)
vis-à-vis a general faithfulness constraint Faith, that characterise the typo-
logically different languages in (1). This is called a factorial typology. Since we
are not interested here in the ways an output can be unfaithful, but only in the
ways in which an output syllable can be marked, a single faithfulness constraint
Faith is used here as an expository simplification. Also, the rankings of the
structural constraints among each other are not relevant here because they do
not conflict with one another.
Syllable types in cross-linguistic and developmental grammars 207
Marked IV languages. These are depicted in (4). In all of these linkings, Faith
gradually moves up in the constraint ranking from the lowest position in the
ranking to the highest position by promoting over one structural constraint at a
time. The linkings of the grammars from Unmarked to Marked IV languages
are hypothesised to be the learning paths that a language learner could take,
going from unmarked Gi to a most marked G f . For ease of exposition, in
(4) languages and a typical example of the added syllable types are mentioned
instead of grammars.
The Dutch children of our study appear to follow only a small subset of
the possible learning paths that go from an unmarked initial stage to a fi-
nal marked stage. These paths are marked by the shaded boxes in (4). In the
next section we shall discuss how these stages have been established. In sec-
tion 5, the relation between cross-linguistic grammars and acquisition stages
will be discussed, and in section 6 the specific learning paths taken by Dutch
children will be explained through an interaction of the adult grammar and
frequency.
4. Language acquisition
In Levelt et al. (2000) the development of syllable types in longitudinal data of
twelve children acquiring Dutch as their first language was examined. For a pe-
riod of one year these children were recorded every other week. The children’s
ages ranged between 0;11 and 1;11 at the start of the data-collecting period. Ap-
proximately 20,000 spontaneous utterances formed the input to a syllabification
algorithm developed by Schiller (Schiller et al. 1996). See Schiller et al. (1996)
or Levelt et al. (2000) for details on this algorithm. The resulting syllable type
data from primary stressed positions were then submitted to a Guttman scale,
at four different points in time: namely, first recording, first three recordings,
first six recordings, and all recordings. With a Guttman scale, a shared order –
of development in this case – can be established, and it can be seen to what
extent a particular order is followed by individual subjects. It turned out that
there was a shared developmental order, with two variants. This order is shown
in (5):2
The structures CV, CVC, V, and VC were acquired in this order by all the
children. One group of children (A, N=9) then went on to acquire CVCC,
VCC, and CCV, CCVC, while another group (B, N=3) acquired the same
structures in a different order, namely CCV, CCVC and CVCC, VCC. The last
syllable type for both groups to be acquired was CCVCC.
From these data a learning path through grammatical stages was deduced,
from a Gi allowing only a structurally most unmarked output, via intermedi-
ate grammars, to a Gf similar to the grammar of Dutch. In these grammars, the
structural constraints from (2) above featured, next to a general faithfulness con-
straint Faith. We shall come back to the exact nature of these developmental
grammars below.
210 Clara C. Levelt and Ruben van de Vijver
Let us now turn to the second problem for the hypothesis that there is a 1:1 rela-
tion between developmental grammars and cross-linguistic grammars: there is
less variation in development than expected. If we neglect G3 and G6 for a mo-
ment, of the twelve possible learning paths that link Gi to Gf only two are taken
by the Dutch language learners (the shaded boxes in (4)). The developmental
grammar G2 , for example, corresponds to the grammar of Thargari (CV, CVC),
not to the grammars of either Cayuvava (CV, V) or Arabela (CV, CCV), which
are equally possible. In going from Gi to G2 , Faith could have promoted over
*Complex-Onset, resulting in the grammar for Arabela, or over Onset,
212 Clara C. Levelt and Ruben van de Vijver
The question here is why the learners at this stage prefer CVC syllables to both
V and CCV syllables. For CCV, the answer could simply lie in the fact that the
sounds needed for the production of complex onsets, especially liquids, are not
acquired at this point (Fikkert 1994). The CCV option is therefore not a very
likely one.
The learner’s choice for CVC syllables instead of V syllables could be due
to the special role of codas in Dutch. Lax vowels in Dutch can only appear
in closed syllables (cf. Van der Hulst 1984, Kager 1989, Fikkert 1994, Van
Oostendorp 1995 and Booij 1995, to give a non-exhaustive list). This gener-
alisation is based on three observations. First, words cannot end in lax vowels
(/tɑ ksi/, but not /tɑ ks /). Second, lax vowels cannot appear in hiatus (/piano/
Syllable types in cross-linguistic and developmental grammars 213
but not/p ano/). Finally, lax vowels can be followed by at most two non-coronal
consonants (/rɑ mp/), while tense vowels can be followed by at most one non-
coronal consonant (/ram/ but not /ramp/). These observations can be explained
if it is assumed that lax vowels only appear in closed syllables and that syllables
can have at most one extra non-coronal consonant following the coda. In other
words, syllables with lax vowels are obligatorily closed. In contrast to this,
there is no process in Dutch that requires syllables to be onsetless. The picture
that emerges is that CVC syllables are more salient than onsetless syllables.
This phonological saliency could draw the attention of learners, and could lead
to the acquisition of CVC before V.
One problem with this phonological explanation is that the distinction be-
tween lax and tense vowels is acquired relatively late in the acquisition process
(Fikkert 1994). As a consequence, the distinction between tense and lax vowels
is not yet acquired at the stage in which the children master the closed syllable. It
is not clear, then, whether children at this stage would recognise the significance
of codas, and therefore of CVC syllables, in Dutch. However, CVC syllables
could also attract the attention of the learner by their frequency of appearance
in speech input to the learner.
In order to check this hypothesis, we looked at actual child-directed speech.
The corpus of syllable types in child-directed speech that was used here (J. C.
van de Weijer, personal communication (1997)) contained 112,926 syllables.6
The syllable type CVC has a frequency of 32.1 percent (N = 36,196) in
the speech corpus, and is the most frequent syllable by far after CV (44.8
percent, N = 50,603). Moreover, it is far more frequent than both V (3.8 percent,
N = 4,345) and CCV (1.4 percent, N = 1,564), and the choice for a G2 that
would allow for CVC syllables could thus very well be based on frequency
information.
The choice here for onsetless syllables instead of a syllable with a complex mar-
gin could have the phonological explanation offered before: the sounds needed
to produce clusters of consonants are still problematic. For this explanation
to be tested, the data on the acquisition of certain sounds and the data on the
acquisition of syllable types need to be compared.
Another explanation is again provided by frequency information. The fre-
quency of the new syllable types that would be allowed by a grammar that has
214 Clara C. Levelt and Ruben van de Vijver
Faith promoted over Onset, V and VC, is 15.8 percent (N = 17,883). The
frequency of new syllables types that would be allowed by a grammar that has
Faith promoted over *Complex-Coda, CVCC, is 3.2 percent (N = 3,669).
Finally, if Faith were to be promoted over *Complex-Onset, the types
CCV and CCVC would be allowed. These types together have a frequency of
3.4 percent (N = 3,804). The onsetless syllables V and VC, with a frequency of
15.8 percent, are more frequent than either the type CVCC (3.2 percent) or the
types with complex onsets CCV and CCVC (3.4 percent). In order to allow for
the most frequent set of syllables, the onsetless syllables, Faith is promoted
over Onset.
At this point variation is found. Some learners take the direction of Spanish
(N = 3), while others take the direction of Finnish (N = 9). The frequency of
the syllable types that would be added to the inventory by promoting Faith
over *Complex-O, CCV and CCVC, is 3.4 percent (N = 3,804), while the
frequency of the types CVCC and VCC, that would be allowed by promot-
ing Faith over *Complex-C, is 3.7 percent (N = 4,146). This difference is
small, but it is still significant (Fisher Exact test, p < 0.001).7 It is thus neces-
sary to become more specific about the possible effect of input frequency on
acquisition. Apparently, learners need a certain threshold to notice a frequency
difference. So let us follow the learning path again, taking into account the ratios
of the different syllable type inputs (standard-error analysis).8 The idea is that
the learner has to hear some syllable type X not just more often but considerably
more often than some other syllable type Y in order for a grammar change to
take place in the direction of syllable type X, rather than in the direction of
syllable type Y.
(8)
Gi : *Complex-C, *Complex-O, Onset, No-Coda >> Faith
G2 options result Input frequency
1: >>Faith >> No-Coda Thargari. Adds CVC CVC: 36,196
2: >>Faith >> Onset Cayuvava. Adds V V: 4,345
3: >>Faith > >*Complex-O Arabela. Adds CCV CCV: 1,564
Option 1 is taken
Syllable types in cross-linguistic and developmental grammars 215
The CVC/V ratio is 8.33 (95 percent confidence interval: 8.08 to 8.58), the
CVC/CCV ratio is 23.14 (95 percent confidence interval: 21.98 to 24.29).
(9)
G2 : *Complex-C, *Complex-O, Onset >> Faith >> No-Coda
G3 options result Input frequency
1: >> Faith >> Onset, . . . Mokilese. Adds V and VC V + VC: 17,883
2: >> Faith >> *Complex-O, . . . Sedang. Adds CCV and CCVC CCV + CCVC: 3,804
3: >> Faith >> *Complex-C, . . . Klamath. Adds CVCC CVCC: 3,669
Option 1 is taken
(10)
G3 : *Complex-C, *Complex-O, >> Faith >> Onset No-Coda
G4 options result Input frequency
1: >> Faith >>*Complex-O, . . . Spanish. Adds CCV and CCVC CCV + CCVC: 3804
2: >>Faith >> *Complex-C, . . . Finnish. Adds CVCC and VCC CVCC + VCC: 4146
Some learners take option 1, other learners take option 2
7. Conclusion
Syllable types have been considered from three different angles in this chapter:
acquisition; typology; and production frequency. In line with OT literature on
language acquisition, it is assumed that acquisition is a process in which faith-
fulness dominates more and more structural constraints. Second, the syllable
types of a number of languages have been considered and, in line with one of
the central claims of OT, it was concluded that differences between languages
are represented as a difference in the ranking of constraints. This served as the
basis for the hypothesis that every stage in the acquisition of a language should
correspond to a cross-linguistic grammar.
This correspondence has been found for all but two stages in the acquisition
of Dutch syllable types. It could be that the relevant non-corresponding stages,
requiring reference to conjoined constraints, are acquisition-specific. However,
the conjoined constraint Onset and NoCoda appears to be necessary in
the analysis of the syllable in Central Sentani (Hartzler 1976), too. As far as
we can see, then, there is no principled reason why *Complex-Onset &
*Complex-Coda should not be active in some, as yet unknown, language.
The hypothesis that every stage in the acquisition of a language corresponds to
a cross-linguistic grammar is thus, at least with respect to syllable type, largely
borne out.
The other part of our hypothesis implied that every cross-linguistically de-
termined grammar could form an intermediate developmental grammar in the
learning path from unmarked Gi to the Gf of a very marked language. It turned
out, however, that learners of Dutch followed a specific learning path, from
numerous possible paths.
This lack of variation is attributed to factors that guide the learner through
the possibilities. The frequency of syllable types in the speech input to the
learner appears to determine which learning path is followed. If the child has a
choice between various paths, the path of the noticeably most frequent syllable
type is chosen. If there is no noticeable difference between the frequencies of
syllable types that correspond to different possible paths, variation is expected
and attested. Future research should determine what the exact threshold is for
a difference to be noticeable and effective. Based on our findings regarding
the effect of frequencies in the speech input on the learning path, many new
hypotheses can be formed. For instance, it follows that learners of languages
Syllable types in cross-linguistic and developmental grammars 217
with similar syllable types but with different speech input frequencies should
show different learning paths. Combining acquisitional data with typological
data in the way outlined here thus appears to provide a good method for research,
leading to interesting hypotheses and findings in both fields.
notes
∗ This chapter was presented at the Third Biannual Utrecht Phonology Workshop,
11–12 June 1998, Utrecht, The Netherlands, organised by René Kager and Wim
Zonneveld. We thank Paul Boersma, Nine Elenbaas, Vincent van Heuven, Joe Pater,
Shigeko Shinohara, and Joost van de Weijer for their helpful comments and their
contributions to this chapter. Financial support to Clara Levelt was provided by the
Netherlands Organisation of Scientific Research (NWO).
1. The input is assumed to be similar to the adult surface form. That is, we assume that the
child perceives the adult form quite accurately (Jusczyk 1997) and that this perceived
form is taken to be the underlying form. If necessary, knowledge of morphological
alternations, acquired at a later stage, will lead to changes in the input forms.
2. The developmental order here has been slightly simplified compared to the order
given in Levelt, Schiller, and Levelt. In their article, syllable types with complex
onsets and those with complex codas are analysed as being acquired in two steps:
CVCC before VCC and CCV before CCVC.
3. As an alternative to conjoined constraints, Shigeko Shinohara suggests high-ranked
constraints against more than one deviation from the canonical syllable types. We
leave this suggestion open for future consideration.
4. Thanks to Nine Elenbaas for providing us with this example.
5. For now we disregard the developmental stages that require reference to conjoined
constraints. G3 and G4 from (6) are thus collapsed, as are G6 and Gf.
6. We are grateful to Joost van de Weijer for generously sharing his data with us. For
details on the speech input database and an analysis of the caretaker’s speech, see
Van de Weijer (1999).
7. The data base of child-directed speech that was available to us contains data from
one caretaker only. It could thus be that the frequencies of these syllable types are
balanced slightly differently for different speakers, to the advantage of complex clus-
ters in the speech of one speaker, but to the advantage of complex onsets in another
speaker. However, in the Dutch data base CELEX, which is based on printed texts,
the frequency for CCV plus CCVC is 6.15 percent, the frequency for CVCC plus
VCC is 6.53 percent, thus showing the same slightly higher frequency for syllable
types with complex codas.
8. Special thanks to Paul Boersma for his help with the statistics.
References
Blevins, J. (1995). The syllable in phonological theory. In J. Goldsmith (ed.) The Hand-
book of Phonological Theory. Oxford: Blackwell.
Booij, G. (1995). The Phonology of Dutch. Oxford: Clarendon Press.
Davidson, L., P. Jusczyk, and P. Smolensky (this volume). The initial and final states:
theoretical implications and experimental explorations of Richness of the Base.
218 Clara C. Levelt and Ruben van de Vijver
Joe Pater
1. Introduction
It is a commonplace observation that there are gaps between perception and
production in child phonology; that children often appear to have receptively
acquired a segmental or prosodic contrast that is neutralised in production (see
section 2 below for a review of some evidence). The developmental lag of pro-
ductive behind receptive ability raises a fundamental question: does perception
or production reflect the state of phonological competence in the child’s emerg-
ing linguistic system? The answer given in this chapter is that the child’s knowl-
edge of phonology is displayed in both domains, and that linguistic competence
can be sufficiently variegated to handle gaps between their development.
Studies of phonological acquisition usually take children’s productions as
the data to be accounted for by phonological analysis, and studies of child
phonology in Optimality Theory (OT) usually follow this tradition (see the
Introduction to this volume for references to this literature; cf. Hale and Reiss
1998, Hayes this volume). However, there do exist data on the development of
perception that appear to demand a phonological treatment. The experimental
tasks used in recent research in infant speech perception tap not only phonetic
discrimination but also the ability to store contrasts in memory, and to link these
phonetic contrasts with meaning distinctions. This research suggests that the
representations accessed in these tasks develop gradually, in ways quite parallel
to the later growth in complexity of the structures employed in production. The
theoretical challenge that arises is to account for the parallels in development
across these domains, while at the same time allowing for synchronic differences
in whether structures have been acquired productively and receptively.
Optimality Theory is well suited to meet this challenge because its constraints
are minimally violable. The activity of a constraint is not an all or nothing affair;
rather, a constraint is violated only to meet the demands of a higher ranked
constraint. I assume that the role of the phonological grammar in perception
is to regulate the complexity, or markedness, of representations that are con-
structed, and/or accessed, on the basis of the acoustic signal. This is done by
219
220 Joe Pater
10
8 *
Same
4
Switch
2 * p< .05
0
[bI]/[dI] [lIf]/[nim] [bI]/[dI]
chequer-board
when the difficult-to-perceive contrasts are acquired, they are also accurately
represented lexically before being produced (Velleman 1988: 233).
Though many details remain to be filled in through further experimentation,
the picture that emerges from the extant research is that the lexical represen-
tations that children initially construct are reduced in segmental complexity
compared to what they can perceive. And at a later stage, when these lexical
representations are enriched, production representations remain to be elabo-
rated. What remains to be determined is the extent to which the restrictions
on the segmental complexity of early lexical representations exactly mirror
the ones that apply to early productions. In the development of prosody, how-
ever, there does exist evidence for the activity of quite similar restrictions on
complexity across perception and production.
The results of this study suggest that trochees are receptively acquired before
iambs. Further evidence for this comes from another study using the Headturn
Preference Procedure, in which Jusczyk et al. (1993) show that 9-month-old
English infants listen longer to lists of initially stressed disyllables than to lists
containing finally stressed disyllables.
Here we have a considerable age gap between the reductions in complexity
evidenced in perception (about 7.5–9 months of age) and those seen in produc-
tion (around 18–24 months). Although Echols and Newport (1992) do propose
that truncations in production are due to the initial syllable being missed in
perception, considerable evidence supports the position that children do lexi-
cally encode the initial unstressed syllable (Fikkert 1994, Gerken 1994, Paradis,
Petitclerc and Genesee 1996, Jusczyk 1997: 186). We can conclude, then, that
there is a genuine comprehension/production gap when children are producing
truncated words.
Bridging the gap between receptive and productive development 227
This tableau evaluates two candidate output surface forms, S1 and S2 , for the
input lexical form L. Bracketing represents prosodic structure; the structure
of S1 corresponds to (4b), and S2 to (4d). The failed candidate S2 , with an
initial syllable outside of foot structure corresponds to the adult prosodification
(Kager 1989, Pater 2000). It violates two of the stipulations of the WordSize
constraint, and is thus marked with a violation. Candidate S1 violates none of
228 Joe Pater
The tableaux in (8) and (10) are identical, except for the labelling in (10) of the
input as the surface form, and of the candidate outputs as lexical forms. I make
Bridging the gap between receptive and productive development 229
The pair of tableaux in (15) and (16) represent the stage of development in
which children lexically represent initial unstressed syllables, but fail to produce
them. The WordSize constraint is violated in comprehension to satisfy higher
Max(SL), but continues to force truncation in production where Max(SL) fails
to apply. The comprehension/production gap is thus dealt with as an instance
of minimal constraint violation.
The adult state would be characterised by the re-ranking of Max(LS) above
WordSize,as illustrated by (17):
(17) Max(SL), Max(LS) WordSize: adult production
L: [ə [rád ]Ft ]PrWd Max(SL) Max(LS) WordSize
S1 [[ád ]Ft ]PrWd **!
This, of course, abstracts from the rich interactions in the adult grammar be-
tween faithfulness and the structural constraints encapsulated in WordSize
(Pater 2000).
232 Joe Pater
Summing up, we have seen that the effects of the WordSize constraint
change over the course of development due to its ranking with respect to the
posited faithfulness constraints:
(18)
Stage 1 – Fully satisfied WordSize: WordSize Max(SL), Max(LS)
Stage 2 – Minimally violated WordSize: Max(SL) WordSize Max(LS)
Stage 3 – ‘Inactive’ WordSize: Max(SL), Max(LS) WordSize
detail in memory, they do not show that the infants are capable of linking
these minimal phonological distinctions to meaning differences. The apparent
contradiction between the results of Jusczyk and Aslin (1995) and Stager and
Werker (1997) can be resolved if we acknowledge the existence of two levels
of representation. The level of lexical representation links phonological form to
meaning, and structures at this level are accessed in the methodology employed
by Werker and colleagues. The surface phonological representation, however,
is meaning-free, and this is the level accessed by the methodology used by
Jusczyk and colleagues.
Under this view, we would need separate faithfulness constraints for the
mapping from perceived acoustic representation to the surface representation
(Faith(AS)), as well as the constraints mapping from surface to lexical rep-
resentation (Faith(SL)). A ranking of Faith(AS) Markedness
Faith(SL) would produce richer surface representations than lexical ones,
consistent with the divergence between the Jusczyk and Aslin (1995) and Stager
and Werker (1997) results. The posited levels of representations, along with the
relevant faithfulness constraints, are illustrated in (19).
(19)
Acoustic Representation Present at birth; non-language-specific
⇓ Faith(AS) ⇓
Surface Representation Language-specific; established at 6–9 months
⇓ Faith(SL) ⇓
Lexical Representation Established at 11–18 months
⇓ Faith(LS) ⇓
Surface Representation Established at 18–24 months
string [ə rád ]. Note that we are dropping Smolensky’s (1996) premiss that the
surface strings must faithfully represent the phonetic form:
(22) Acoustic–phonetic Surface Lexical
(a) ə rád [ád ]Ft ]PrWd / ád /
(b) ə rád [ə [rád ]Ft ]PrWd /ə rád /
Markedness constraints would prefer (22a); this would be the correct outcome
for the earliest stages of acquisition, in which comprehension representations
are simplified. To make (22b) optimal, it must be the case that there are con-
straints that conflict with markedness besides ordinary lexical–surface faithful-
ness, which are fully satisfied in both (22a) and (22b). Here we need to allow
at least a limited grammatical status to acoustic–phonetic representations, as
the targets of Faith(AS) constraints. Such constraints are in a sense analo-
gous to Output–Output constraints (see, e.g., McCarthy and Prince 1995, 1999,
Benua 1997) in demanding identity between the surface form and some non-
input form. The rankings that generate the three stages of development would be
identical to those proposed in section 3.1, with the exception of the substitution
of Faith(AS) for Faith(SL); faithfulness between lexical and surface forms
is now governed by a single set of constraints. Generalising over markedness
and faithfulness constraints, the schema for the stages of acquisition would be
as follows:
(23) Stage 1: Markedness Faith(AS), Faith(LS)
Stage 2: Faith(AS) Markedness Faith(LS)
Stage 3: Faith(AS), Faith(LS) Markedness
Stage 1 is the schema for reduced representations in comprehension and pro-
duction, stage 2 for faithful comprehension and reduced production, and stage
3 for faithful comprehension and production. In (24) I provide a tableau for
perception at stage 1, using the structural constraint WordSize, along with
general ‘Faith’ constraints that are violated by any change to the segmental
makeup of a string:
(24) WordSize Faith(AS), Faith(LS): perception
Acoustic–phonetic form: ə rád WordSize Faith (AS) Faith (LS)
(a) L: /ád / ∼ S: [[ád ]Ft ]PrWd *
(b) L: /ə rád / ∼ S: [ə [rád ]Ft ]PrWd *!
(c) L: /ád / ∼ S: [ə [rád ]Ft ]PrWd *! *
Following McCarthy and Prince (1999), I take the candidates supplied by Gen
to be input∼output pairs (= lexical∼surface). Given by the perceptual system is
the form ə rád . Because of the ranking of Faith(AS) beneath WordSize,
the reduced structure (24a) is preferred over the faithful representations of the
acoustic–phonetic string in (24b) and (24c). Note that candidate (24c) is har-
monically bounded by (24b); no ranking of these constraints will ever make it
optimal. This points to the fact that this model cannot produce lexical repre-
sentations that are simplified relative to surface ones; the consequences of this
will be taken up below. For now, the important point is that this model does
allow for initially impoverished comprehension, and subsequent development,
unlike the proposal in Smolensky (1996).
The subsequent development is captured by the re-ranking of Faith(AS)
above WordSize; with Faith(LS) remaining beneath the markedness con-
straint, we produce the comprehension/production gap:
(25) Faith (AS) WordSize Faith (LS): perception
Acoustic–phonetic form: ə rád Faith (AS) WordSize Faith (LS)
(a) L: /ád / ∼ S: [[ád ]Ft ]PrWd *!
(b) L: /ə rád / ∼ S: [ə [rád ]Ft ]PrWd *
In production, the lexical form is given, and the candidate set is thus restricted
to mappings that contain it. Candidate (26b) is chosen due to its satisfaction of
WordSize; here it satisfies Faith(AS) vacuously, since working memory
contains no trace of the adult pronunciation. Interestingly, this suggests an ap-
proach to the greater pronunciation accuracy often observed in imitation: if the
adult acoustic–phonetic representation were still in memory, Faith(AS) would
target the surface forms in production, and would favour faithful pronunciations.
This alternative model retains the key element of the proposal presented
earlier in this chapter: structural constraints influence the outcome of compre-
hension. In deriving the structure of lexical forms from surface forms with-
out the influence of markedness, and in treating the ‘input∼output’ mapping
as equivalent to a ‘lexical∼surface’ mapping it is consistent with Smolensky
238 Joe Pater
(27) we must assume that the same constraints are operating both within
and across words. We might, as Matthei (1989) proposed, move the
output constraints from the output lexicon to somewhere else – but
where? And if we do that, how do we specify the way that those
constraints will interact with the lexicon to create the child’s output
representations?
Bridging the gap between receptive and productive development 239
In terms of Optimality Theory, the answers to these questions are now obvious:
output constraints are in the grammar, and they interact with the lexicon through
the mediation of faithfulness constraints.
5. Conclusions
This proposal follows Smolensky (1996) in using Optimality Theory to model
developing receptive and productive phonology with a single grammar. Where
it differs is in that the grammar constrains early comprehension as well as early
production, instead of simply mapping perceived surface forms to identical
lexical forms and thus making receptive phonology flawless from the outset of
acquisition.
While it might seem obvious that comprehension does in fact develop, current
evidence from developmental speech perception further suggests that receptive
competence unfolds in a manner parallel to the later development of productive
competence, with similar initial unmarked segmental and prosodic structures
in the two domains.
These parallels can be captured by having a single set of structural constraints
that applies to both the produced surface form as well as the receptively cre-
ated and accessed lexical form. With these constraints ranked over faithfulness
constraints, the initial state of the grammar admits only structurally unmarked
representations. Receptive competence develops by the promotion of compre-
hension specific faithfulness constraints above the structural constraints, as well
as the establishment of rankings between the structural constraints. Productive
competence is allowed to lag behind, however, by the lower rank of the pro-
duction specific faithfulness constraints, whose later promotion is indicated by
the evidence that relatively complete lexical representations underlie simplified
productions.
From the perspective of phonological acquisition, one might question
whether receptive development is in fact subject to all and only the constraints
that are later seen in production. Indeed, a strong version of this proposal
would hold exactly that, and it is worth maintaining that position in the ab-
sence of evidence to the contrary. It is likely that some structural constraints
are comprehension or production specific, but at this point, the extent to which
receptive and productive development converge on the same constraints re-
mains an empirical question. This question seems independently interesting,
since in attempting to answer it, we shall likely forge a deeper understanding
of what the shared grammatical core is between domains, as well as where they
diverge.
Beyond the domain of first language acquisition there is also a growing
body of evidence that shows that phonotactic restrictions play a role in adult
speech perception (Dupoux et al. 1999, Moreton 1999, 2000). This can be taken
as additional support for the central thesis of this chapter: that in perception,
240 Joe Pater
markedness constraints do intervene between the raw acoustic signal and the
lexical representation.
notes
∗ Portions of this chapter were presented at From Speech Perception to Word Learning:
A Workshop, at the University of British Columbia, as well as in colloquia at Utrecht
University, the Max Planck Institute for Psycholinguistics, Nijmegen, University of
Groningen, SUNY Stonybrook, and the University of Massachusetts, Amherst. Spe-
cial thanks to Janet Werker for inviting me to speak on these issues at the workshop,
and for much helpful subsequent discussion, as well as to Paul Smolensky for ex-
tensive comments on an earlier version of this chapter. Conversations with Ellen
Broselow, Angela Carpenter, Suzanne Curtin, Della Chambless, Anne Cutler, Lyn
Frazier, Bruce Hayes, Peter Jusczyk, René Kager, John Kingston, Clara Levelt, John
McCarthy, Steve Parker, Doug Pulleyblank, Chris Stager, and Wim Zonneveld also
helped to clarify my thoughts on the topic. This research was supported by SSHRC
postdoctoral fellowship 756-96-05211 and SSHRC research grant 410-98-1595, for
which I am grateful.
1. One might instead posit a more indirect link between performance and grammar,
as suggested by the linking hypotheses of Smolensky et al. (this volume). There do
not seem to be any obvious barriers to implementing the present proposal in such
terms; one would simply need a means for the rank of faithfulness constraints to vary
depending on the use to which the grammar is being put.
2. Interestingly, for one account of Jusczyk et al.’s (1999) findings to go through, we
must assume that the onset is at least sometimes borrowed from the initial syllable.
Recall that when the 7.5-month-olds are trained on guitar, they do not listen longer
to guitar passages. We might assume that in familiarisation, a lexical form is created
which is then matched against the passage. If the infants stored [tɑ r], then this would
match perfectly with the stressed portion of guitar. And indeed, when 7.5-month-olds
are trained on tar, they do listen longer to guitar passages. If, however, [ɑ r] is the
lexicalised form, then it does not match the stressed syllable of guitar (due perhaps to
Faith-based preference for dorsal over coronal). I should emphasise, however, that
my strategy in this chapter is not to provide such a direct account of the experimental
evidence. Rather, I make the minimal inference from these experiments that trochaic
forms are acquired before iambic ones, and then I provide a grammatical account of
the stage of development at which only trochees are present.
3. In so far as babbling is taken to interact with the grammar, one might hypothesise
that constraints mapping perceived surface representations to produced surface rep-
resentations would be at issue.
4. Under a Parse/Fill theory, this candidate could not even be generated, since in-
serted segments are empty syllabic nodes, rather than featurally specified phones
(unless phonetic interpretation happened to supply a [ə ] and [r] as the default vowel
and consonant, respectively). This would be another difficulty in producing reduc-
tions in structure in lexical structures under the account presented in Smolensky
(1996).
5. Similar difficulties also face the alternative to Smolensky’s model presented in Hale
and Reiss (1998), where an initial ranking of Faithfulness over Markedness is posited
Bridging the gap between receptive and productive development 241
to yield initially rich lexical representations. In this proposal though, not only is it
impossible to portray the development of receptive competence but the development
of production is also explicitly claimed to be extra-grammatical. One is therefore
led to wonder what empirical domain in acquisition would be considered under the
provenance of grammar; while Hale and Reiss suggest that perceptual experiments
might be used to tap phonological competence, it is not clear whether any of those
reviewed in the present chapter would be considered suitable.
References
Allen, G. and S. Hawkins (1978). The development of phonological rhythm. In A.
Bell and J. Hooper (eds.) Syllables and Segments. New York: Elsevier-North-
Holland.
Barton, D. (1976). The Role of Perception in the Acquisition of Phonology. Ph.D. disser-
tation, Stanford University. Distributed 1978, Indiana University Linguistics Club.
(1980). Phonemic perception in children. In Yeni-Komshian et al. (eds.). 97–116.
Beckman, J. (1998). Positional Faithfulness. Ph.D. dissertation, University of
Massachusetts, Amherst.
Benua, L. (1995). Identity effects in morphological truncation. University of Mas-
sachusetts Occasional Papers in Linguistics 18: Papers in Optimality Theory.
Amherst, Graduate Linguistic Studies Association (GLSA), University of
Massachusetts.
(1997). Transderivational Identity: Phonological Relations between words. Ph.D.
dissertation, University of Massachusetts, Amherst.
Brown, C. and J. Matthews (1997). The role of feature geometry in the development
of phonemic contrasts. In S. J. Hannahs and M. Young-Scholten (eds.) Focus on
Phonological Acquisition. Philadelphia: Benjamins.
Boersma, P. (1998). Functional Phonology. Ph.D. dissertation, University of
Amsterdam.
Burzio, L. (1996). Surface constraints versus underlying representation. In J. Durand and
B. Laks (eds.) Current Trends in Phonology: Models and Methods. Manchester:
European Studies Research Institute, University of Salford.
Compton, A. J. and M. Streeter (1977). Child phonology: data collection and preliminary
analyses. In Papers and Reports on Child Language Development 7. Palo Alto,
Calif.: Stanford University.
Demuth, K. (1995). Markedness and the development of prosodic structure. In NELS
25. Amherst: GLSA, University of Massachusetts.
Donahue, M. (1986). Phonological constraints on the emergence of two-word utterances.
Journal of Child Language 13. 209–218.
Drachman, G. (1976). Child language and language change: a conjecture and some
refutations. In J. Fisiak (ed.) Recent Developments in Historical Phonology. The
Hague: Mouton.
Dupoux, E., K. Kakehi, Y. Hirose, C. Pallier, and J. Mehler (1999). Epenthetic vowels
in Japanese: a perceptual illusion? Journal of Experimental Psychology: Human
Perception and Performance 25. 1568–1578.
Echols, C. and E. Newport (1992). The role of stress and position in determining first
words. Language Acquisition 2. 189–220.
242 Joe Pater
1. The problem
All languages have distributional regularities: patterns which restrict what
sounds can appear where, including nowhere, as determined by local syn-
tagmatic factors independent of any particular morphemic alternations. Early
Generative Phonology tended to slight the study of distributional relations in
favour of morphophonemics, perhaps because word-relatedness phonology was
thought to be more productive of theoretical depth, reliably leading the ana-
lyst beyond the merely observable. But over the last few decades it has be-
come clear that much morphophonemics can be understood as accommoda-
tion to phonotactic requirements, e.g., Kisseberth (1970), Sommerstein (1974),
Kiparsky (1980), Goldsmith (1993), etc. A German-like voice-neutralising al-
ternation system resolves rapidly when the phonotactics of obstruent voicing
is recognised. And even as celebrated a problem in abstractness and opacity as
Yawelmani Yokuts vocalic phonology turns on a surface-visible asymmetry in
height-contrasts between long and short vowels.1
Distributions require nontrivial learning: the data do not explicitly indicate
the nature, or even the presence, of distributional regularities, and every dis-
tributional statement goes beyond what can be observed as fact, the ‘positive
evidence’. From seeing X in this or that environment the learner must somehow
conclude ‘X can only appear under these conditions and never anywhere else’ –
when such a conclusion is warranted.
245
246 Alan Prince and Bruce Tesar
so. All positive data that support the less-inclusive grammar also support the
broader grammar. More precisely, no fact consistent with narrower grammar
can ever be inconsistent with the predictions of its more-inclusive competitor,
even though the broader grammar also allows forms not generable – ruled
out (negative evidence!) – by the less-inclusive grammar. This is the Subset
Problem, familiar from the work of Angluin (1980) and Baker (1979). If a
learner mistakenly adopts the superset grammar when the subset grammar is
correct, no possible positive evidence can contradict the learner’s adopted but
incorrect hypothesis. The misstep of choosing a superset grammar makes the
subset grammar unreachable (from positive evidence). By contrast, if on the
basis of the same ambiguous evidence the subset grammar is chosen, positive
evidence – observable forms licensed by one grammar but not the other – will
exist to move the learner along when the superset grammar is the actual target.
As an abstract example of the problem, consider two languages: A = {an ,
n ≥ 1} which consists of every string of as and AB = {an bm , n ≥ 1, m ≥
0}, which consists of any strings of as possibly followed by a string of bs. A
is a proper subset of AB. Suppose that the target language for learning is A.
If, upon encountering data from the target language (strings of as), the learner
should guess that the language is AB, then there will be no positive evidence
contradicting this hypothesis, and the learner is stuck in error. Now consider
the situation in which the target of learning is AB. If the learner happens to
encounter data consisting only of strings of as, and guesses A, an error has also
been made. But here positive evidence exists which contradicts this hypothesis –
namely strings containing bs – and when any such are encountered, the grammar
A can be rejected as inconsistent with observation.
To see the relevance to phonotactic learning, consider the case of comple-
mentary distribution. Imagine a language in which s and š both appear: what is
the learner to conclude if the first encounter with š occurs in a word like šite?
Suppose that as in Nupe and (Yamato) Japanese and many other languages, š can
in fact only occur before i. The overhasty generaliser who decides that š
occurs freely can never be liberated from this notion, because every further
š-word that comes in is entirely consistent with it. The negative fact that š
fails to occur before a, e, o, u simply does not register on the positive-minded
learner. If the learner had opted for a narrower grammar – say one in which š
is derivable by palatalisation of s – not only would the irrecoverable overgen-
eralisation have been avoided, but there would be no difficulty in correcting
the hypothesis, should the target language turn out upon further sampling to
be more like English: observation of words like šem and šam would refute the
subset grammar.
Within the tradition of language learnability work, the standard response to
the subset problem is the Subset Principle of Berwick (1982): whenever more
than one language is consistent with the data, and there is a subset relation
between the languages, always select the grammar for the subset language.
Learning phonotactic distributions 247
That way, a grammar is not selected which overgenerates, that is, generates
forms which are illicit when the target language is the subset language.
It might seem that the subset principle is the last word to be had on the
issue: include the step in any learning algorithm, and the subset problem will
be avoided. However, implementation is far from trivial. Empirical linguistic
reality forces linguistic theories to permit large numbers of possible grammars,
and this makes it impractical to equip a learner with an exhaustive lookup-
table indicating subset relations between the languages generated by each and
every pair of grammars. Given the combinatorial structure of modern linguistic
theories, such a table would threaten to eclipse in size the linguistic system
giving rise to the grammars. Linguistic theory aims to ease the learning problem,
not to magnify it into something that requires a huge paralinguistic life-support
system.
A reasonable response would be to try to compute subset relations on the
basis of the individual analytical atoms of linguistic systems. For instance,
different settings of a particular parameter might result in languages with subset
relations. This approach has been commonly assumed when the subset principle
is discussed in the context of the Principles and Parameters framework (see, e.g.,
Jacubowitz 1984, Wexler and Manzini 1987). But it does not provide a complete
and principled solution. Clark (1992), using what he calls ‘shifted languages’,
shows that a learner can obey the subset principle on a parameter-by-parameter
basis, yet may still end up (incorrectly) in a superset language, when subset
relations depend on correlated parameter settings. Even more radically, when
the elements of a linguistic theory interact, in violation of the Independence
Principle of Wexler and Manzini (1987), the possible subset (and non-subset)
relationships between different settings of one parameter may depend crucially
upon how other parameters of the system are set.
Consider the following schematic example. Suppose a subsystem in the Prin-
ciples and Parameters framework has three binary parameters, as illustrated in
table 1. The subset/superset relations of languages distinguished by the settings
of P1 depend upon how the other parameters, P2 and P3, are set.
248 Alan Prince and Bruce Tesar
Observe the relationship between the languages defined for the opposing
values of P1. When, as in the first pair of rows, the other parameters are set
−P2 /−P3, the −P1 and +P1 languages have no subset relations. When, as in
the middle pair of rows, the other parameters are set +P2/−P3, the language
with −P1 is a subset of the language with +P1. Finally, when the setting is
−P2/+P3, the language with −P1 is now a superset of the +P1 language. The
learner cannot assume in advance that a particular value for parameter P1 is
the subset value.2 Positing interaction is an almost inevitable consequence of
the drive for generality in theories of linguistic form, and in section 6 below,
we point to a realistic analogue within Optimality Theory (OT).
The Subset Principle, then, is really more of a goal than a computational
means for achieving that goal. It is a property possessed by a satisfactory the-
ory of learning-from-positive evidence: not a mechanism within the theory,
but a desired consequence of its mechanisms. Given a set of data and a set of
grammars which are consistent with that data, the learning device functions to
pick the grammar generating the language which is a subset of the languages
generated by the other consistent grammars, when the subset/superset rela-
tion exists. There is no guarantee as to the computational ease of determining
which of the grammars is the subset; indeed difficult theorem-proving may be
required.
The Subset Principle can be seen as an instance of the general occamite
analytical strategy: an inquirer should always select the ‘most restrictive’ hy-
pothesis consistent with the data. But the term ‘restrictive’ has a number of
different applications in the grammatical context, and it is worth while to sort
out the one that is relevant here. Under certain interpretations, it applies only
to candidate theories of Universal Grammar (UG) and not at all to individual
grammars. In one such sense, perhaps the primary sense of the term, a grammat-
ical theory is ‘restrictive’ to the degree that it provides few grammars consistent
with determinative data (cf. the concept of VC-dimension in formal learning
theory, as in Vapnik and Chervonenkis 1971); a ‘restrictive’ theory eases the
learning task by limiting ambiguity in the interpretation of the data. In another
sense, also applied at the same level, a theory of grammar is sometimes said
to be ‘restrictive’ if it allows for a relatively limited typology of distinct gram-
mars. A theory of UG that predicts only the basic word orders {SOV, SVO} is
sometimes said to be more restrictive than one that predicts, say, {SOV, SVO,
VSO}. This second sense is independent of the first and need have no rele-
vance to learnability, so long as the evidence for each of the options is readily
available and unambiguous. In contrast to these, the sense that interests us ap-
plies to individual grammars within a fixed theory of grammar: if grammar G1
generates a language which is a subset of the language of grammar G2, then
we will say that G1 is ‘more restrictive’ than G2. Yet further uses of ‘restric-
tive’ at this level of analysis are occasionally found, as the term comes to be
Learning phonotactic distributions 249
associated with any property presumed desirable that involves having less of
something. But criteria of relative grammatical desirability where no subset re-
lationship exists – for example, having fewer rules – have been considered to be
of limited concern for language learnability. Competing grammar-hypotheses
that do not lead to subset languages can be distinguished by data: each gener-
ated language has forms not found in the other, which can appear as positive
evidence.
The subset issue appears in full force in the learning of distributional reg-
ularities. Indeed, the descriptive language used to depict phonotactics usually
takes the form of stating what cannot occur in the language, even though the
data available as evidence depict only those things which can occur. For the
learning of phonotactic distributions, then, the goal is always to select the most
restrictive grammar consistent with the data. The computational challenge is
efficiently to determine, for any given set of data, which grammar is the most re-
strictive. Since psychological realism demands efficient computability, solving
this problem is crucial to the generative enterprise.
need Faithfulness to preserve s. Consequently, when the learner opts for the
markedness solution to the appearance of . . . ši . . . , a grammar banning *si is
automatically obtained; crucially, there is no contemplation of the unobserved
/si/ as a possible input. The distinction between s and š has been contextually
neutralised in the effort to reproduce observed . . . ši . . . against the background
supposition that š should not exist. By configuring a grammar to perform the
identity map under M » F bias, the learner will end up with a grammar that deals
with all possible inputs (Richness of the Base), without having to review each
of them.
• Durability of M » F structure. Learning takes place over time as more data
accumulate. The choice between Faithfulness and Markedness solutions recurs
at each stage of the process. It is not enough to set up an initial state in which
M » F; rather, this must be enforced throughout learning, at each step. This point
figures in Itô and Mester (1999b) and is also explicitly recognised by Hayes
(this volume). Further justification is provided in section 4.4 below.
The basic strategy, then, is to seek the most restrictive grammar that maps each
observationally validated form to itself . This grammar will give us G(di ) =
di for every datum di – the di are the ‘fixed points’ of the mapping – and
should give us as few other output forms as possible, given arbitrary input.
Although we seek restrictiveness, we do not attempt directly to monitor the
contents of the output language, i.e., to compute the entirety of G(U), where
U is the set of universally possible inputs. In essence, we have set ourselves
a different but related goal: to find a grammar that minimises the number of
fixed points in the mapping from input to output. Each faithfulness constraint
demands that some piece of linguistic structure be fixed in the grammatical
mapping, and faithfulness constraints combine to force fixed-point mappings
of whole composite forms. By keeping the faithfulness constraints at bay, we
shall in general discourage fixed-point mappings in favour of descent to the
lowest Markedness state possible. The goal of fixed point minimisation is not
abstractly identical to language-based restrictiveness – they are the same only
when the generated language is exactly the set of fixed points – but they appear
to be quite close under standard assumptions about the nature of markedness
and faithfulness constraints.6
Minimisation of fixed points is, again, not something to be directly computed.
We need a measure that can be applied to the grammar itself, without reviewing
its behaviour over the set of possible inputs. The idea, of course, as in the
literature cited above, is that markedness constraints should be ranked ‘as high
as possible’ and faithfulness constraints ‘as low as possible’. We can move in
the direction of greater precision by specifying that the faithfulness constraints
should be dominated by as many markedness constraints as possible. This is
precise enough to generate a numeric metric on constraint hierarchies, which
we shall call the r-measure.
252 Alan Prince and Bruce Tesar
The measure is coarse, and it does not aim to reckon just the crucial M/F
interactions; but the larger the r-measure, the more restrictive the grammar
ought to be, in the usual case. Even so, the r-measure is not the end-all, be-all
characterisation of restrictiveness. It says nothing of the possible restrictiveness
consequences of different rankings among the markedness constraints or among
the faithfulness constraints. Below we shall see that the intra-Markedness issue
is solved automatically by the class of learning algorithms we deal with; the
intra-Faithfulness problems are distinctly thornier and will be the focus of much
of our discussion.
It is also not obvious (and perhaps not even likely) that for every data set
there is a unique hierarchy with the largest r-measure.7 But the r-measure is a
promising place to start. Given this metric, we can offer a concrete, computable
goal: any learning algorithm should return a grammar that, among all those
consistent with the given data, has the largest r-measure.
A conflict avoided: high vs. low. Before asking how to achieve this goal, it
is worth noting a couple of the r-measure’s properties. First, the most restrictive
possible grammar, by this measure, will have all of the faithfulness constraints
dominated by all of the markedness constraints, with an r-measure that is the
product of the number of faithfulness constraints and the number of markedness
constraints. Thus, for a given OT system, the maximum possible r-measure is
that product.
Second, selecting the hierarchy with the largest r-measure is consistent with
the spirit of ranking each of the markedness constraints ‘as high as possible’,
i.e., with respect to all other constraints, both Markedness and Faithfulness.
This distinguishes the r-measure from another possible measure: that of adding
together, for each faithfulness constraint, the number of constraint strata above
it – call it the s-measure. (A stratum is a collection of constraints undifferentiated
in rank, produced by the learning algorithms we employ.) Consider the stratified
hierarchy in (2): the r-measure is 2 + 3 = 5, while the s-measure is 1 + 3 = 4.
Now consider the stratified hierarchy in (3), which is just like (2) save that
the top stratum has been split, creating a dominance relationship between M1
and M2 .
Adding this new intra-Markedness relationship does not directly change the
relationship between any of the faithfulness constraints and any of the marked-
ness constraints. Correspondingly, the r-measure of (3) is 5, the same as for
(2). However, the s-measure of (3) is 6, larger than the s-measure for (2). Max-
imising the s-measure will have the effect of demoting markedness constraints
in ways not directly motivated by the data. By maximising the r-measure in-
stead, learning can be biased towards having faithfulness constraints low in the
ranking without directly impacting the relative ranking of the markedness con-
straints. This turns out to be a good thing because, as discussed below, it permits
the continued use, when ranking the markedness constraints, of the principle of
Constraint Demotion (CD), which mandates that constraints be ranked as high
as possible. Using the r-measure instead of the s-measure to characterise ‘as low
as possible’ for faithfulness constraints avoids a possible conflict between such
low-ranking and the goal of ranking markedness constraints maximally high.
Distributional evidence has some inherent limitations: from data with only
CV syllables, one cannot distinguish whether epenthesis or deletion is respon-
sible for dealing with problematic inputs. In general, if UG allows a range of
solutions to a markedness constraint, distributional evidence will only assure
the learner that at least one of them is in use, without specifying which one.
The more tightly the range is specified, either by UG or by other informative
constraint interactions, the more the learner will grasp. But even in the case of
ambiguity, the analysis of distributional data will give excellent evidence about
relations within the markedness subcomponent of the grammar, and very good
evidence as well about the niches in the hierarchy where faithfulness constraints
fit. The stage investigated here will typically not fix the complete constraint hi-
erarchy for the entire phonology of the language, but it will aim to provide
a crucial infrastructure that limits the range of options that the learner must
consider. According to this programme of explanation, the increasingly sophis-
ticated learner will have the luxury of revising the constraint hierarchy devel-
oped through phonotactic learning as knowledge of morphophonemic relations
develops, using further techniques for hypothesising and testing nontrivial un-
derlying forms, ultimately arriving at the correct adult grammar.
Having taken via the r-measure a global stance on the restrictiveness of
grammars, the challenge we face is to devise an efficient and local step-by-
step algorithm that produces grammars in accord with it. The transition be-
tween global goal and local implementation is not entirely trivial, as we shall
see.
Our point of departure is an existing procedure for constructing a constraint
hierarchy from data: Recursive Constraint Demotion (RCD). RCD generates a
grammar consistent with a set of mark-data pairs. A mark-data pair (mdp) is an
elementary ranking argument: a competition between two candidates from the
same input, one of them optimal, the other not, along with the information about
their constraint-violation profiles. The member of the pair that is (desired to be)
optimal will be termed the winner; the other, suboptimal competitor will be
the loser. The important action takes place comparatively: the raw quantitative
violation data must be analysed to determine, for each constraint, which member
of each pair does better on that constraint, or whether the constraint is neutral
between them; mere violation is, of course, meaningless by itself. (This step
is known as mark cancellation.) We need a technical term to describe the
basic comparative relationship that emerges after mark cancellation: let us say
that a constraint prefers one candidate to another when the first candidate has
fewer violations than the second. Note that a candidate can satisfy a constraint
completely, yet still not be ‘preferred’; it all depends on how its mdp competitor
fares.
A single mark-data pair contains information about how the constraints
must be ranked so that the desired winner actually wins. At least one of the
Learning phonotactic distributions 255
constraints preferring the winner must dominate all of the constraints prefer-
ring the loser. This, rephrased, is the Cancellation/Domination Lemma of Prince
and Smolensky (1993: 148). We know from Cancellation/Domination that the
loser-preferring constraints must all be subordinated in the ranking, and subor-
dinated to constraints that prefer the associated winners. Turning this around,
we see that the complementary set of constraints – those that do not prefer any
losers among the mark-data pair list, those that only prefer winners when they
prefer anything – can all be ranked at the top. The algorithm develops from this
observation.
RCD applies to a constraint set coupled with a list of elementary ranking
arguments (mdps, possibly arising from different inputs), and recursively as-
sembles a constraint hierarchy consistent with the information in the mdp-list.
RCD does not aim to construct all possible totally ordered rankings consistent
with an mdp-list. The goal of RCD is to produce a single stratified hierarchy,
where the strata consist of non-conflicting constraints. Various total orderings
can be produced by ranking the constraints within the strata, retaining the dom-
ination order between strata. The stratified hierarchy produced by RCD has
a unique property: each constraint is placed in the highest stratum that it can
possibly occupy.
The goal of grammar-building is to organise the constraints so that all known
ranking arguments are simultaneously satisfied. If any such constraint hierar-
chies exist, the stratified hierarchy produced by RCD algorithm will be among
them.
RCD starts by collecting all the constraints that prefer only winners when
they prefer anything (they prefer no losers). These are placed at the top of the
hierarchy, forming a stratum of non-conflicting winner-preferring and winner-
loser-neutral constraints. The learner then dismisses from further consideration
all those mark-data pairs Wi ∼Li on which some constraint just ranked prefers
winner Wi to loser Li . This move discards those competitors that lose on some
constraint in this stratum, and leaves behind those mark-data pairs where both
members have done equally well. In this way, the list of mark-data pairs shrinks
just as the list of constraints to be ranked is shrinking.
The learner then faces exactly the same problem again – ranking a set of con-
straints so as to be consistent with a list of ranking arguments, but with fewer
constraints and fewer mark-data pairs. This is a canonical recursive situation,
and RCD simply repeats its ranking-and-elimination procedure on the depleted
collection of mark-data pairs and constraints. The second stratum of the hier-
archy is now filled with those constraints that prefer only winners among the
mark-data pairs left over from the first round. If the ranking arguments in the
mark-data pairs list are mutually consistent, the RCD process will repeat until
all mark-data pairs have been dismissed, and all constraints have been placed
in the hierarchy. A grammar will have been found.
256 Alan Prince and Bruce Tesar
Should the data be inconsistent, a point will be reached where none of the
remaining constraints prefers only winners – each yet-to-be-ranked constraint
will prefer at least one loser on the remaining mark-data pairs list. These con-
straints cannot be ranked, and it follows that no grammar exists for the original
mark-data pairs list over the constraint set. Our focus here will be on finding
grammars for consistent data, but we note that detection of inconsistency via
RCD can play an important role whenever generation and pruning of tentative,
fallible interpretations of data are required, as shown in Tesar (1997, 1998).
Here is a brief example that illustrates the use of RCD. The mark-data pairs
list is presented in the form of a comparative tableau (Prince 2000, 2001a,b),
the appropriate representation for ranking arguments. Competing members of
a mark-data pair are shown as ‘x ∼ y’, where ‘x’ is the desired optimum or
winner, and ‘y’ is a competitor and desired loser. As always, we say that a
constraint prefers candidate x to candidate y if x has fewer violations of the
constraint than y does; and vice versa. If a constraint prefers one candidate of
the pair to the other, then a token that indexes the preferred candidate is placed
in the appropriate cell: ‘W’ if it prefers the winner, ‘L’ if it prefers the loser.
If a constraint prefers neither candidate (because both violate it equally), the
cell is left blank. A constraint hierarchy (arrayed left-to-right in domination
order, as usual) is successful over a mark-data pairs list if the first non-blank
cell encountered in each row, going left to right, contains W. This indicates that
the highest ranking constraint that distinguishes the members of the pair prefers
the winner. In terms of the comparative representation, then, the action of RCD
is to take a tableau of Ws and Ls and find an arrangement of the columns in
which all Ls are preceded by at least one W. Let us perform RCD.
Of the three constraints, C3 alone prefers only winners (prefers no losers). The
first round of RCD will therefore place C3 in the top stratum. This eliminates
mark-data pairs (b) by virtue of the ‘W’ awarded by C3, leaving the following:
Now C1 prefers only winners (and more, albeit rather trivially: lacking neutral
cells, it prefers all winners). C1 goes into the second stratum and eliminates
Learning phonotactic distributions 257
mark-data pair (a), exhausting the mark-data pairs list and relegating C2 to the
lowest stratum.
Notice that C1 did not at first prefer-only-winners, prior to the elimination
of mark-data pair (b), because C1 prefers L2. Thus, we can say that C1 was
freed up for ranking when C3 entered the first stratum. The resulting hierarchy,
C3 » C1 » C2, leaves the tableau looking like this:
As promised, each row begins with a ‘W’ that settles the competition in favour
of the winner.
A full presentation of RCD, along with a formal analysis of its basic prop-
erties, can be found in Tesar (1995), and Tesar and Smolensky (2000); for an
informal account, see Prince (2000). Samek-Lodovici and Prince (1999) ex-
plores the relationship between harmonic bounding and RCD, and provides an
order-theoretic perspective on both. Prince (2001b) further explores the rela-
tionship between RCD and the logic of ranking arguments.
RCD applies not to raw data but to mark-data pairs: observable winners
paired with unobserved and ungrammatical forms (losers). The learner must
therefore be equipped to construct an appropriate mark-data pairs list from
observed data, algorithmically supplying the losing competitors. Two criteria
must be satisfied by any competitor-generating mechanism. First, the generated
loser must in fact be ungrammatical as an output for the given input. Second,
the comparison thus constructed should be informative: of all the multitudes of
ungrammatical outputs, the one chosen to compete should lead to the imposition
of new ranking relations through RCD; otherwise there is no point to computing
the comparison.
The method of error-driven learning discussed in Tesar and Smolensky (2000)
satisfies both criteria. Suppose a hypothesised grammar at some stage of learn-
ing produces an error, an incorrect mapping from input to output. Rectifying
the error requires re-ranking so that the desired candidate fares better on the
constraint hierarchy than the currently obtained but spurious optimum. RCD
will accomplish this if the new mark-data pair desired winner ∼ currently ob-
tained winner is added to the mark-data pair list. The grammar itself, through its
errors, thus supplies the meaningful competitors that are needed to improve its
performance.8 Other methods of finding useful competitors can be imagined –
for example, conducting some kind of search of the ‘nearby’ phonological or
258 Alan Prince and Bruce Tesar
grammatical space – but we will adhere to the error-driven method in the inter-
ests of focus, well-definition, and gaining a more precise understanding of its
strengths and weaknesses.
It is important to note that intermediate grammar hypotheses, based on partial
data, will often have strata containing constraints that turn out to conflict when
more mark-data pairs are considered. By consequence, such grammars will
tend to produce multiple outputs for each input, due to undecided conflicts, in
cases where the final grammar would produce only one. Following Tesar and
Smolensky (2000), we regard each output that is distinct from the observed
datum as an error to be corrected. We abstract away from issues of variation,
then, and work under the idealisation that each input corresponds to only one
legitimate output.
The course of learning is assumed to proceed over time as in the Multi-
Recursive Constraint Demotion (MRCD) of Tesar (1998). Rather than manip-
ulating a grammar-hypothesis per se by switching rankings around, the learner
develops and retains a permanent data base of mark-data pairs. A grammar hy-
pothesis can be generated easily from the current data base by the application
of RCD. From this point of view, the contribution of the phonotactic/identity-
map learning stage is precisely the accumulation of a valuable mark-data pair
data base. MRCD allows RCD to be utilised with ‘online’ learning, that is,
with progressive augmentation of the learner’s data base. The learner uses the
current grammar to apply error-driven learning to each form as it is observed.
When an error occurs on a form, the learner constructs a new mark-data pairs
and adds it to the list (which was initially empty). A new constraint hierarchy
is then generated by applying RCD to the entire accumulated mark-data pairs
list, and the learner is ready to assess the next piece of data when it arrives. In
this way, the learner responds to incoming data one form at a time, constructing
and storing mark-data pairs only when necessary.
No such evidence will ever arise for faithfulness constraints when the identity
map is optimal. Faithfulness will be demoted only when non-identity mappings
are desired; that is, when input and output differ crucially, a situation that for
us requires knowledge of morphophonemic alternations. A strictly RCD-based
learner cannot grasp distributions from positive evidence alone.
RCD must therefore be reconfigured so as to impose high-ranking except
for faithfulness constraints. The modified algorithm should place a faithfulness
constraint into a hierarchy only when absolutely required. If successful, this
strategy will yield a hierarchy with a maximal r-measure. Let us call the new al-
gorithm – or rather class of algorithms, since a number of variants are possible –
Biased Constraint Demotion (BCD): the ranking procedure will be biased
against faithfulness and in favour of markedness constraints as candidates for
membership in the stratum under construction.
On each recursive pass, RCD places into the next stratum those constraints
that prefer only winners among the mark-data pairs list. Our basic principle
takes the form of the following modification to RCD, given in (4):
(4) Faithfulness Delay. On each pass, among those constraints suitable
for membership in the next stratum, if possible place only marked-
ness constraints. Only place faithfulness constraints if no markedness
constraints are available to be placed in the hierarchy.
In this way, the placement of faithfulness constraints into the hierarchy is de-
layed until there are no markedness constraints that can do the job. If all the
mark-data pairs are consistent with a hierarchy having the entire set of faithful-
ness constraints at the bottom, that is what will be returned.
However, the data may require at least one faithfulness constraint to dominate
some markedness constraint. This will happen only when all of the constraints
available for placement into the next stratum are faithfulness constraints. At
this point, the algorithm must place at least one faithfulness constraint into the
next stratum in order to continue.
What is not obvious is what, precisely, to do at this point. At least one
faithfulness constraint must be ranked, but which one?
At this stage, neither M1 nor M2 can be ranked – both prefer some loser
(marked by L in the cell). If we rank F2, we accomplish nothing, since the set
of mark-data pairs remains the same. If we rank F1, however, the problematic
mark-data pair (b) will be eliminated, taking with it the L-mark that holds M1
in check. With mark-data pair (b) gone, M1 is ‘freed up’ and may be ranked
next.
As a first step towards identifying the best F constraints to rank, we can
set aside a constraint-type that is definitely not necessary to rank: the F that
prefers no winners, being neutral over the mark-data pairs at hand. Any such
constraint, when ranked, will never eliminate any mark-data pairs and cannot
free up markedness constraints. This gives us our second principle:
(5) Avoid the Inactive. When placing faithfulness constraints into the
hierarchy, if possible only place those that prefer some winner.9 If the
only available faithfulness constraints prefer no remaining winners,
then place all of them into the hierarchy.
Observe that the second clause only applies when the bottom-most stratum is
being constructed. Following the principle will ensure that completely inactive
faithfulness constraints always end up at the very bottom of the hierarchy.
(Completely inactive markedness constraints will, of course, end up at the top.)
With this much of a selection-criterion at hand, we can give a version of the
Biased Constraint Demotion algorithm. This is not the final version of our
proposal, but it expresses the ideas presented thus far. We write it out as generic
pseudocode, with italicised comments on the side.
Learning phonotactic distributions 261
4.2.2 The logic of the algorithm (version 1) The initial If -clause ensures
that faithfulness constraints are not ranked when markedness constraints are
available. Markedness constraints will continue to be ranked as high as possible,
because they are placed into the ranking as soon as they prefer-only-winners
among the current mark-data pairs list. Faithfulness constraints are placed into
the hierarchy when they are the only ones available for ranking. In such a
circumstance, all remaining markedness constraints must be dominated by at
least one of the rankable faithfulness constraints.
The bracketed, bolded section of the algorithm contains the conditions de-
termining which among the available faithfulness constraints will be ranked; it
is this section that we must scrutinise and sharpen. Currently, it places a nec-
essary condition, one that will persist, but needs further elaboration. Looking
only at the isolated behaviour of individual F constraints, it picks up the weak-
est form of activity. All that is asked for is that an F constraint prefers some
winner; the behaviour of other constraints on the relevant mark-data pairs is not
considered.
Remark. Because discussion in the literature often concentrates only on M/F
conflict, it is important to recall that for a markedness constraint M1 to be
262 Alan Prince and Bruce Tesar
evidentiary hint is offered as to what an input like /ša/ should lead to, when ša
has never been observed. The information gained from errors is not evidence
about what is missing from the overt, observable language. This is the key
question, and the one that cannot be asked: is š free or restricted in distribution?
Is ša in the language or not? In the absence of positive evidence to the contrary,
BCD aims to resolve such questions in favour of the restrictive pole of the
disjunction.
{M1 , M2 , . . . , Mn }»{F1 , F2 , . . . , Fk }.
|← M →| |← F →|
264 Alan Prince and Bruce Tesar
Other articulations can be added to the initial state, e.g., Markedness scales
enforced through fixed domination relationships, etc. As with the M » F bias,
any such additional fixed relationships must be actively enforced throughout
learning. Ongoing respect for fixed constraint relationships is not an additional
burden created by the RCD-based approach; any theory of learning constraint
rankings must achieve it. Under the approach proposed here, based on RCD,
all such relationships must be continuously active, and the initial state follows
as a consequence.
Should we rank F1, F2, or both? Observe that F2 is quite useless, even though
it prefers a winner and is therefore active in the sense used above. By contrast,
F1 conflicts with M1 and must be ranked above it. (Constraint conflict means
that ‘W’ and ‘L’ both occur in the same row.) Even better, when F1 is ranked,
it frees up M1. Because F2 conflicts with nothing, it makes no difference at all
where it is ranked. By the r-measure, it should be ranked at the bottom. This
would also be true if M1 preferred the winner for mark-data pairs (b). The first
moral, then, is that F constraints should be ranked only if they conflict with
other constraints.
266 Alan Prince and Bruce Tesar
The present approach already achieves this result without the need for further
intervention. In cases of non-conflict where F is active, the loser is harmonically
bounded by the winner. (Harmonic bounding by the winner in comparative
tableaux is signalled by the occurrence of a tableau row containing only Ws
and possibly blanks.) The error-driven competitor-selection mechanism will not
find harmonically bounded losers, because they never appear as winners for any
provisional hierarchy. Consequently, no such competitions will arise under the
error-driven regime, and no special clauses will be required to interpret them.
In addition, error-driving aside, it should be noted that some logically possible
patterns may simply not occur: winner/loser pairs distinguished by Faithfulness
but not by Markedness (as in mark-data pair (b) above).
Freeing up. Conflict is a necessary precondition for freeing up, because ev-
ery removal of an L-containing row involves conflict resolution. But it is not
sufficient. All Ls in a constraint column must go, if the (M) constraint is to be-
come rankable. In general, there may be many paths to freedom. Let us examine
two fundamental complications: gang-effects, and cascading consequences.
F-Gangs. It can easily be the case that no single F constraint working by
itself can free up a markedness constraint, yet a set of them can combine to do
so. Here is an illustration:
Neither M constraint can be placed at this point, since each prefers at least
one loser (cells marked L). We must look to the F constraints. Both F1 and F2
are equally suitable by the freeing-up criterion, and each forms a smallest set
of rankable F. If F2 is ranked first, the following hierarchy results:
(7) F2 » M2 » F1 » M1 r = 1
If F1 is ranked first, though, the mark-data pair (a) is eliminated, and with
(a) gone, the constraint M1 suddenly becomes available for ranking. Ranking
M1 next then frees up M2 for ranking, and F2 drops to the bottom. The result
is hierarchy (8):
(8) F1 » M1 » M2 » F2 r = 2
These considerations indicate that the r-measure can respond to effects that
are not immediately visible in the next round of ranking, but are consequences
of that next round, or consequences of consequences, and so on. Here again
a tactical issue arises: to pursue the global r-measure unrelentingly would
268 Alan Prince and Bruce Tesar
probably require, in the worst case, sorting through something close to all
possible rankings. It is doubtful that there is much pay-off in this mad pursuit.
We conjecture that it will suffice for linguistic purposes to consider only the
M-freeing consequences up to the next point where an F constraint must be
ranked, i.e., until the cascade of M constraints arising from F-placement runs
out. This leads to our last principle:
(9) Richest Markedness Cascade. When placing faithfulness constraints
into the hierarchy, if more than one F-set freeing up some markedness
constraint is of smallest size, then place the F-set that yields the largest
set of M constraints in contiguous subsequent strata, i.e., until another
F constraint must be ranked. If there is more than one of these, pick
one at random.
As with the smallest-F-set criterion, it is possible with a sufficient number
of suitably interacting M and F constraints to craft a case where going for a
smaller immediate M-cascade is offset by gains further on.11 More investigation
is needed to see what role, if any, this might play in linguistic systems, and
whether such distinctions actually lead to discernible differences in substantive
restrictiveness. If, as we suspect, they do not, then the suggested criterion may
be more than just a reasonable heuristic compromise between the r-measure and
computational efficiency. On the other hand, if distinctions of richness between
immediate cascades fail to lead to discernible differences in restrictiveness of
the resulting grammars, then it might even be preferable to forgo the effort of
computing them as well.
We note that the algorithm could be pushed in the direction of greater com-
plexity by being restructured to chase down ever more distant consequences
of ranking decisions. It could also be scaled back, if its level of sensitivity to
the ultimate M»F structure turns out to be excessive when the limitations of
realistic constraints are taken into account. At present, it stands as a reasonable,
if in part conjectural, compromise between the global demands of the r-measure
and the need for local computability.
End-Repeat
If FreeingFaithSets contains only one faithfulness constraint set
Set BestFaithSet to be the constraint set in FreeingFaith
Sets
Else
Set BestFaithSet to the constraint set returned by Select-best-
faith-set(FreeingFaithSets)
End-if
Return(BestFaithSet)
Choosing Between Same-sized Minimal Faithfulness Constraint Subsets
Select-best-faith-set(FreeingFaithSets):
For each set FaithSet-y in FreeingFaithSets
Place FaithSet-y in the next stratum of the constraint hierarchy
under construction
Continue BCD forward until another faithfulness constraint must be
placed in the hierarchy
Set Value-y to the number of markedness constraints ranked after the
placement of FaithSet-y
End-for
Set BestFaithSet to the member FaithSet-y of Freeing-
FaithSets with the largest Value-y
If there is a tie for the largest Value-y, pick one of them at random
Return(BestFaithSet)
only, consider the observation that da occurs in the language. What should the
learner deduce?
This is the canonical situation in which some F must be ranked, since the
only available M prefers the loser. The tableau puts both F constraints on an
equal footing, and BCD would pick randomly between them. But the choice is
consequential. If the general constraint is placed, then no amount of rat and tat
will inspire the positive-only German learner to imagine that rad and tad are
forbidden. But if the special constraint is placed, the German learner is secure,
and the English learner can be disabused of over-restrictiveness by subsequent
encounters with positive data.
A plausible move at this point would be to add an injunction to the F-selection
clause of BCD, along the lines of Hayes (1999: 19, this volume) and Smith
(1999), giving priority of place to the special constraint (or to the most special,
in case of a more extended hierarchy of stringency). We are reluctant to take this
step, because it does not solve the general problem. There are at least two areas
of shortfall, both arising from the effects of constraint interaction. Both illustrate
that the relation of stringency is by no means limited to what can be computed
pairwise on the constraints of CON. First, two constraints that have only partial
overlap in their domain of application can, under the right circumstances, end
up in a special-to-general relationship. Second, the special/general relation-
that is relevant to restrictiveness can hold between constraints that seem quite
independent of each other. Let us review each in turn.
Derived subset relations. The first effect is akin to the one reviewed above
in section 1 for parameter-setting theories, where dependencies among the
parameters can reverse or moot subset relations. Two constraints which do
not stand as special to general everywhere can end up in this relation within
a hierarchy where higher ranking constraints reshape the candidate set in the
right way. Because the set of candidates shrinks as evaluation proceeds down the
hierarchy, the general Venn diagram for overlap between domains of relevance
of two constraints can lose one of its ‘ears’ at a certain point, converting overlap
into mere subsetting.
Consider two such generally overlapping constraints: F:Ident/1 (), which
demands identity when the -bearer is in the first syllable, and F:Ident/´(),
which demands identity when the -bearer lies in a stressed syllable. Neither
environment is a subset of the other over the universal set of forms, but if
there are higher ranked constraints that ensure stress on initial syllables as
well as other stresses elsewhere in the word, then suddenly any violation of
Learning phonotactic distributions 273
F:Ident/ F:Ident/
/pá:to/ M:InitialStress 1 () ´() M:*V+
pá:to ∼ páto W W L
The learner who picks the stressed-based condition will have made an irrecov-
erable overgeneralisation error if the target language only allows long vowels in
the first syllable. (As always, if the learner chooses 1 incorrectly, when the tar-
get language allows long vowels in all stressed syllables, or freely distributed
without regard to stress, then positive evidence will be forthcoming to force
correction.)
The direction of implication will be reversed in a language which allows at
most one stressed syllable, always initial, so long as words with no stressed
syllables also occur. Here Faithfulness in 1 implies Faithfulness in ´, but not
vice versa. (As for the plausibility of this case, we note that although stress is
usually required in all content words, in languages where stress behaves more
like pitch accent, stressless words may indeed occur; see, e.g., the discussion
of Seneca in Chafe (1977: 178–180) and Michelson (1988: 112–113).)
This result effectively rules out a solution to the general problem that is
based on direct comparison of the internal syntactic structures of constraints, in
the manner of Koutsoudas, Sanders, and Noll (1974) or Kiparsky (1982). The
constraint-defining expression simply does not contain the required informa-
tion, which is contextually determined by the operation of individual constraint
hierarchies on individual candidate sets. We tend to regard this tactic as dis-
tinctly implausible anyway, since it requires a meta-knowledge of the structure
of formal expressions which is otherwise quite irrelevant to the working of the
theory. Nor is it plausible to imagine that the learner is somehow equipped to
characterise surviving sets of candidates in such a way that subset relations
could be computed. Here again, an appeal would have to be made to a kind
of meta-knowledge which the ordinary functioning of the theory quite happily
does without, and which will be in general quite difficult to obtain.
Third party effects. To see how independent-seeming constraints may be
tied together by the action of a third constraint, let us examine a constructed
voicing-agreement pattern of a type that can be legitimately abstracted from
274 Alan Prince and Bruce Tesar
initial M»F stage), will find competitors for ba (pa is less marked no matter
what the ranking of the M constraints is), ab (ap is less marked) and azba (aspa
is less marked). This yields the following tableau:
Recall that the left-hand member of each pair x ∼ y is not only the desired
winner, but also an exact match with its assumed input.
BCD puts the constraint M:Agree at the top, as shown, but after that, no
other M constraint is immediately rankable. Each remaining active F frees up
an M constraint; which should be chosen?
r BCD as currently formulated unambiguously chooses the general F:stop-
voi, because it frees up both remaining M constraints. The other two F
constraints free up only one M constraint.
r The doctrine of preferring special to general favours the placement of F:stop-
voi/Ons over placing its general cognate, regardless of the diminished
freeing-up capacity of the special constraint. But this doctrine makes no dis-
tinction between F:stop-voi/Ons and F:fr-voi, which are not members
of the same stringency family.
Each choice determines a distinct grammar with different predictions about
what additional forms are admitted (see Appendix 3 for details). High-ranking
of F:stop-voi leads to this:
It is worth noting that BCD has successfully found the hierarchy with the
optimal r-measure. It is the r-measure’s characterisation of restrictiveness that
has failed.
Stepping back from the details, we see that the path to greatest restrictiveness
will be obtained by ranking those F constraints with, roughly speaking, fewest
additional consequences. In the case of an F: vs. F:/Ons pair, it is clear that
the restricted version will have this property. But here we face an additional
choice, from outside a single stringency family. F:stop-voi/Ons is by no
means a special case of F:fr-voi, and indeed one might naively imagine them
to be completely independent – what should az have to do with ba? In isolation,
nothing: yet they are connected by the force of M:Agree in azba.
It is tempting to imagine that the notion of fewest additional consequences
might be reducible to some computable characteristic of performance over the
mark-data pair list. In the case of the special/general stringency pair operative
here, observe that the special F constraint deploys fewer Ws than the general
version. This is precisely because the special constraint is relevant to fewer
structural positions and will be vacuously satisfied in regions of candidate space
where the general version assesses violations. Since in the current learning
context the optimal form satisfies all faithfulness constraints, it either ties or
beats the suboptimum on each of them. One might then attempt to capitalise on
this observable property, and add a bias to the algorithm favoring an M-freeing
F constraint that assesses fewest Ws over the mark-data pair list.13 (A similar
effect would be obtained if the algorithm monitored and favoured vacuous
satisfaction.) But the data that are available to the learner under error-driven
learning will not always include the relevant comparisons.14 In the case at hand,
it is the dangerously general F:fr-voi that is observably the most parsimonious
of Ws, using just one while successfully freeing up M:*z.
It appears that the algorithm should choose the F constraint that is relevant to
the narrowest range of structural positions. By this, any F/P for some position P
should be favoured over any F, given the choice, even if they constrain different
elements. But in general this property need not be obviously marked in the
constraint formulation itself. Between F/P1 and F/P2 the choice will be more
subtle, as we saw in our first example, and subject to conditioning by other
constraints.
We must leave the issue unresolved, save for the certainty that managing
the special/general relationship cannot be reduced to statements about the
Learning phonotactic distributions 277
Suppose now that the learner must deal with the form da.
number of distinct hypotheses: one for each non-empty subset of Ws. And this
statement gives the hypothesis set for just one mark-data pair. Multiplying them
all out is sure to produce an impressive tangle of ranking arguments. Constraint
demotion avoids the problem of ‘detangling the disjunction’ of the constraints
favouring the winner, by instead focusing on demoting the constraints in the
conjunction, those preferring the loser. When rendered into an algorithm, the
effect of focusing on the conjunction is to leave constraints as high as possible,
because constraints are only demoted when necessary, and then only as far
down as necessary. A correct hierarchy is guaranteed to emerge, even though
the algorithm has never attempted to determine precisely which constraints
must dominate which others, satisfying itself with finding out instead which
constraints can dominate others.
But when we attempt to determine which faithfulness constraints to rank next,
we are precisely trying to ‘detangle the disjunction’, by picking from among
the (faithfulness) constraints preferring the winner(s). Attempting to select from
among the faithfulness constraints preferring winners steers a course straight
into the formal problem that constraint demotion avoids. This does not inspire
great confidence in the existence of an efficient fully general solution to the
formal problem.
Fortunately, a fully general solution is probably not required: the choice is
only made from among faithfulness constraints, and only under certain circum-
stances. Recall the simple example of indeterminacy: F1 must dominate M1,
and F2 must dominate M2, with no dependencies indicated by the mark-data
pairs between (F1 and F2) or between (M1 and M2). If there are in fact no
dependencies at all (indicated by the mark-data pairs or not) between (F1 and
F2), (M1 and M2), (F1 and M2), or (F2 and M1), then the choice will not matter.
The same grammar should result from either of the hierarchies in (16) and (17).
(16) F1 » M1 » F2 » M2
(17) F2 » M2 » F1 » M1
Indeed, ranking (18) should also give the same grammar, although it would
fare worse on the r-measure than the other two.
(18) {F1, F2} » {M1, M2}
In such a circumstance, there really would be no harm in randomly picking
one of {F1, F2} to rank first. So, the fact that there is no obvious basis for a
choice does not automatically mean that there is ‘wrong choice’ with attendant
overgeneration.
More generally, how much the learner needs to care about such choices
will depend upon the structure of the Faithfulness component of UG. This
leads to a request for a more informed theory of Faithfulness, in the hopes
280 Alan Prince and Bruce Tesar
that a more sophisticated understanding will yield a better sense of how much
computational effort the learner ought to exert in making these ranking decisions
during learning.
(19)
M1 M2 F1 F2
I L W
II L W
III L W
The rankings required by {I, II, III} are these: F1 » {M1 M2 } and F2 » M2 . While
RCD will simply put F1 and F2 together at the top of a two-stratum hierarchy,
BCD gives the following:
{F1 }» {M1 } »{F2 } »{M2 } r=1
Let us examine the results of confronting the data in the order I, II, III, using
ranking conservatism to guide the process. We start out with the initial state:
(20) {M1 M2 } »{F1 F2 }
For conciseness and visibility, we shall write this with bars separating the
strata:
(21) M1 M2 | F1 F2
Now, mark-data pair I requires F2 » M2 .
(22) M1 M2 | F1 F2 → M1 | F2 | M2 | F1 PI = 1
This is the minimal perturbation, inverting only the relation between M2
and F2 . But it has induced an apparent total ranking on the constraint set, of
which only the one ranking is actually motivated. Note in particular F2 » F1 , the
opposite of what the M-high solution requires.
Now consider the effect of mdp II, which requires F1 » M1 .
(23) M1 | F2 | M2 | F1 → F2 | M2 | F1 | M1 PI = 3
The arrival of mark-data pair III, F1 » M2 , seals the case:
282 Alan Prince and Bruce Tesar
(24) F2 | M2 | F1 | M1 → F2 | F1 | M2 | M1 PI = 1
The desired outcome, F1 | M1 | F2 | M2 , has a PI of 4 with respect to the input
ranking of (24), F2 | M2 | F1 | M1 . Ranking conservatism has led us far away
from the best M » F solution.
M1 M2 M3 M4 F1 F2 F3 F4
P1 L W W
P2 L W W
P3 L W W
P4 L L L W W
P5 L L L W W
P6 L L L W W
284 Alan Prince and Bruce Tesar
Of particular interest is the extent to which the cases where the algorithm fails
to optimise the r-measure are also cases where the r-measure fails to correlate
with actual restrictiveness. If the differences in r-measure are among hierarchies
generating identical grammars, then failure to find the hierarchy with the best
r-measure is not a problem, so long as the hierarchy actually learned is one
generating the restrictive language.
M F M F
F: F: F: F:
M:Agr stp-voi M:*b M:*z stp-voi/O fr-voi fr-voi/O
ba ∼ pa W L W
ab ∼ ap W L
azba ∼ aspa W L L W W
✯/abza/→ abza ∼ apsa W L L W W
✯/absa/→ abza ∼ apsa W L L L L
/za/ → sa ∼ za W L L
/az/ → as ∼ az W L
This is the most restrictive grammar and introduces no further z than are ob-
served. Stop-voicing spreads regressively in clusters, which are otherwise voice-
less: /abza, absa/ → apsa.
M F M F M F
F: F: F: F:
M:Agr stp-voi/O M:*z stp-voi M:*b fr-voi fr-voi/O
ba ∼ pa W W L
ab ∼ ap W L
azba ∼ aspa W L W L W
✯/abza/→ apsa ∼ abza W L W L L
✯/absa/→ apsa ∼ abza W L W W W
/za/ → sa ∼ za W L L
/az/ → as ∼ az W L
Grammar 3. F : f r - vo i high. r = 9.
{M:Agree}» {F:fr-voi }» {M:*z} » {F:stop-voi} » {M:*b}»
{F:fr-voi/O, F:stop-voi/O}
This grammar preserves all /z/, and therefore predicts the widest range of
non-observed forms through new identity maps: /abza/ → abza, /za/ → za,
/az/ → az. Fricatives retain voicing everywhere and therefore fricative voicing
is dominant in clusters.
M F M F M F
F: F: F: F: F:
M:Agr fr-voi M:*z stp-voi M:*b fr-voi/O stp-voi/O
ba ∼ pa W L W
ab ∼ ap W L
azba ∼ aspa W L W L W
✯/abza/→ abza ∼ apsa W L W L W
✯/apza/→ abza ∼ apsa W L L L W
/absa/→ apsa ∼ abza W W L W W
/za/ → za ∼ sa W L W
/az/ → az ∼ as W L
Learning phonotactic distributions 287
notes
∗
The authors’ names are given alphabetically. Prince would like to thank the John
Simon Guggenheim Memorial Foundation for support. Thanks to Jane Grimshaw,
John McCarthy, Paul Smolensky, and the Rutgers Optimality Research Group for
helpful discussion.
Hayes (this volume) independently arrives at the same strategy for approaching the
learning problems dealt with here, and he offers similar arguments for the utility and
interest of the approach. We have tried to indicate by references in the text where the
identities and similarities lie. Along with the overlap in basics, there are significant
divergences in emphasis, analytical procedure, and development of the shared ideas;
we suggest that readers interested in the issues discussed below should consult both
papers.
1. The non-existence of long high vowels on the surface in Yawelmani Yokuts, coupled
with their underlying necessity, is the centrepiece of generations of commentary.
Bruce Hayes reminds us, however, that the allomorphs of the reflexive/reciprocal
suffix contain the otherwise unobserved high long vowels, indicating the need for a
more nuanced view of their still highly restricted distribution in the language.
2. Different kinds of interactions have led Dresher and Kaye (1990) and Dresher (1999)
to hypothesise elaborate paralinguistic strategies such as pre-establishing, for each
subsystem, the order in which its parameters must be set. This is meant to replace
learning from error. Ambiguity of analysis, rather than the subset relation, provides
the central difficulty they focus on.
3. Swimming against the tide, Hale and Reiss (1997) insist on F » M as the default. For
them, then, word learning yields no learning of phonology. To replace M » F learning,
they offer an ‘algorithm’ that regrettably sets out to generate and sort the entirety of
an infinite set in its preliminary stages.
4. For ease of exposition, we write as if there were only one constraint with the effect
*š and one with the effect *si. In a fuller treatment of the example, we would see that
all such constraints would be subordinated in (a) and that some M(*si) would have
to dominate all M(*š) in (b). The thrust of the argument – distinguishing between
Markedness and Faithfulness solutions to ambiguous data – is preserved amidst any
such refinements.
5. Hayes (this volume) operates under the same I = O assumption, which he attributes
to a suggestion from Daniel Albro.
6. The two diverge when a language admits forms which are not fixed points. Such
forms do not add to the number of fixed points, but do increase the number of forms
overall. A simple illustration is a language with a chain shift. Consider the following
two mappings: [x→y, y→z, z→z] and [x→z, y→z, z→z]. The two mappings have the
same number of fixed points (one), but differ in the languages produced: the first, with
the chain shift, produces {y,z} while the second produces the more restrictive {z}.
We are assuming that the learning of all non-identity mapping relations, including
chain shifts, is done at the stage of morphophonemic learning.
7. This correlates with the fact that there is typically going to be a number of different
grammars, with different patterns of Faithfulness violation, that produce the same
repertory of output forms. For example, whether syllable canons are enforced by
deletion or epenthesis cannot be determined by distributional evidence. Similarly, in
288 Alan Prince and Bruce Tesar
References
Angluin, D. (1980). Inductive inference of formal languages from positive data. Infor-
mation and Control 45. 117–135.
Baker, C. L. (1979). Syntactic theory and the projection problem. LI 10: 4. 533–581.
Beckman, J. (1998). Positional Faithfulness. Ph.D. dissertation, University of Mas-
sachusetts, Amherst. [ROA 234, http://roa.rutgers.edu]
Learning phonotactic distributions 289
Shigeko Shinohara
1. Introduction
There has been a renewal of interest in the study of loanword phonology since
the recent development of constraint-based theories. Such theories readily ex-
press target structures and modifications that foreign inputs are subject to (e.g.,
Paradis and Lebel 1994, Itô and Mester 1995a,b). Depending on how the for-
eign sounds are modified, we may be able to make inferences about aspects of
the speaker’s grammar for which the study of the native vocabulary is either
inconclusive or uninformative. At the very least we expect foreign words to
be modified in accordance with productive phonological processes and con-
straints (Silverman 1992, Paradis and Lebel 1994). It therefore comes as some
surprise when patterns of systematic modification arise for which the rules
and constraints of the native system have nothing to say or even worse contra-
dict. I report a number of such ‘emergent’ patterns that appear in our study of
the adaptations of French words by speakers of Japanese (Shinohara 1997a,b,
2000). I claim that they pose a learnability problem. My working hypothesis
is that these emergent patterns are reflections of Universal Grammar (UG).
This is suggested by the fact that the emergent patterns typically correspond to
well-established cross-linguistic markedness preferences that are overtly and
robustly attested in the synchronic phonologies of numerous other languages. It
is therefore natural to suppose that these emergent patterns follow from the de-
fault parameter settings or constraint rankings inherited from the initial stages
of language acquisition that remain latent in the mature grammar. Thus, the
study of foreign word adaptations can not only probe the final-state grammar
but may also shed light on the initial state. The implication for L2 acquisition is
that UG latent in L1 is accessible in a later stage in life (cf. Epstein, Flynn, and
Martohardjono 1996, see Broselow and Park 1995, Broselow, Chen, and Wang
1998 reporting similar UG emergent patterns in their studies of inter-language
phenomena). Of course, this hypothesis awaits confirmation from acquisition
studies.
Under the assumption made above about the Emergence of the Unmarked in
foreign word adaptation, it is possible to derive evidence bearing on the default
292
Universal Grammar in foreign word adaptations 293
On the segmental level, Japanese segments are substituted for French ones.
Japanese syllable structure is relatively restricted; (3) below provides a list of the
Universal Grammar in foreign word adaptations 295
licit syllable types. When the input loanword contains a consonantal sequence
that cannot be parsed into one of these syllable types, vowel epenthesis applies
to allow the consonants to be syllabified.
(3) Syllable structure of Japanese
Light syllable: (C)(j)V (ex. /ha/ ‘tooth’, /tja/ ‘tea’)
Heavy syllables: (C)(j)VV (ex. /too/ ‘tower’, /dai/ ‘title’)
(C)(j)VN (ex. /teN/ [teŋ] ‘point’)
(C)(j)VQ (ex. /haQ.pa/ [hap a] ‘leaf’)
(N: moraic nasal)
(Q: moraic obstruent; the first half of a geminate obstruent)
On the syllabic level, the phenomenon I shall call ‘pre-final lengthening’ is also
observed. When a foreign word ends in a syllable closed by a single consonant,
as in |bak| bac, the prefinal syllable of the adapted form is lengthened by a
gemination, as in /bakku/. On the accentual level, pitch accent is assigned. I
shall present some of the adaptation phenomena that appeared in each level in
later sections.
3. Constraint interaction
In this section, I shall explain briefly the theoretical framework that I shall use
and its significant consequences. The data are analysed in the Optimality frame-
work (Prince and Smolensky 1993, McCarthy and Prince 1995). OT defines the
grammar of a language as a hierarchy of universal constraints. The constraints
are divided into two broad categories: Structural (or markedness) constraints
reflecting unmarked forms as defined by UG; faithfulness constraints for preser-
vation of input properties.
I shall illustrate the interaction between the two types of constraints by taking
the example of plosive assibilation. Assibilation is a process whereby a plosive
becomes an affricate in certain phonological environments (cf. Walter 1988).
For example, [t] alternates with [ts] before high front vowels in Quebec French.
The feature change in these segments is expressed by using aperture features
(Steriade 1993):
part [Amax ] to the release stage. An affricate has the closure in common with a
plosive, but the release is with friction, hence, its feature specification is [A0 Af ].
Turning to Japanese, let us formulate two constraints in conflict regarding
the assibilation of alveolar plosives:
(5) Assibilation (*TU): No coronal plosive is followed by a high
nonback vowel.2
Ident-[A max ]: the release feature [Amax ] of a plosive must be iden-
tical in the input and the output.
Assibilation above is a Markedness constraint. Ident-[Amax ] belongs to the
Faithfulness constraint family. In tableau 1, the input /mat-u/ ‘wait, non past’
from a Japanese /t/-ending verb stem followed by the suffix /-u/ contains a /tu/
sequence. Suppose that this input /t/ has the plosive release feature [Amax ]. When
the Markedness constraint dominates Faithfulness, the grammar neutralises the
release feature of /t/, and gives the output [matsu], with assibilation:
Tableau 1
Markedness Faithfulness
mat-u *TU Ident-[Amax ]
matu *!
→matsu *
If the constraints are ranked in the reversed order Ident-[Amax ] *TU, the
consonant [t] will preserve its release feature, and the output is without assibi-
lation [matu], as shown below.
Tableau 2
Faithfulness Markedness
Let us first turn to the analysis of assibilation and mutation in the native lex-
icon. I assume the universal constraint *Affricate, reflecting the marked-
ness of affricates as opposed to plosives. To explain the fact that the mutation
is observed for /d/ but not for /t/, I shall split *Affricate into two sub-
categories:
(7) *DZ: Do not have voiced affricates
*TS: Do not have voiceless affricates
By using these two constraints, the mutation for /d/ and the absence of the mu-
tation for /t/ is explained as follows. On the one hand the faithfulness constraint
for the closure feature, Ident-[A0 ] (see (8)) dominates *TS, as illustrated in
tableau 3 below.
(8) Ident-[A0 ]: the feature [A0 ] of corresponding segments in the input
and in the output must be identical.
Tableau 3
/ . . . tu/
Tableau 4
/. . . du/
The choice of vowels in each case is explained as follows. The [high] feature
of the vowel |u| is more important than the appearance of the voiceless affricate
[ts], thus the adapted form /tsuuruuzu/ preserves the high vowel /u/, as shown
in tableau 6.
Tableau 6
Toulouse /tsuuruuzu/
Whereas in /poNpidoo/ the more marked voiced affricate is avoided at the cost
of sacrificing the vowel identity. This is easily described in the OT model as
the insertion of the Faithfulness constraint Ident-[high] within the Marked-
ness preference ranking for voiceless over voiced affricates: *DZ Ident-
[high] *TS. Tableau 7 illustrates the effect of this ranking.
Tableau 7
Pompidou /poNpidoo/
In sum, we have reviewed two quite distinct places in the grammar of Japanese
where the preference for voiceless affricates over voiced ones asserts itself. First,
in the native vocabulary intervocalic /t/ before an /u/ is realised as an affricate
(/natu/ → [natsu] ‘summer’) while /d/ is spirantised (/mikaduki/ → [mikazuki]
‘increasing moon’). Second, in loanword adaptation /tu/ is realised with an af-
fricate (Toulouse → [tsuuruuzu]) while /du/ changes the vowel (Pompidou →
[poNpidoo]). While it could be argued that the former process establishes
the *DZ *TS ranking, it is implausible that the change of vowel that
we see in Pompidou → Pompi[doo] is a direct consequence of the avoid-
ance of [dz] seen in /mikaduki/ → [mikazuki]. After all, spirantisation and
lowering of a high vowel are quite different phonological substitutions. But
viewed in the OT constraint-based approach, both phenomena reflect the marked
status of voiced affricates relative to voiceless ones that is encoded in the
fixed ranking *DZ *TS that by hypothesis forms part of the initial stage
of acquisition. If this reasoning is correct, then we see an aspect of UG
emerging in the adaptation process and the latter provides a window on the
former.
(15) English stress and pitch accent placement of the adapted forms of
English words
Nouns Adapted forms Verbs Adapted forms
pı́cnic /pi kunikku/ mátter /ma taa/
ı́nfluence /i NhurueNsu/ órganize /o oganaizu/
beháviour /bihe ibijaa/ invéstigate /iNbe sutigeito/
technı́que /tekuni iku/ implý /iNpura i/
dı́fficulty /di fikarutii/ contrı́bute /koNtori bjuuto/
the adapted forms of French words, syllables containing epenthetic vowels count
as prosodic constituents relevant for accent placement; however, the accent
seems to shift when the predicted position is filled by an epenthetic vowel.8
In (22) and (23), adapted forms end in a light syllable, thus we expect that
the accent falls on the antepenultimate mora. As shown in (22), when an input
contains a non-epenthetic vowel in this position, the accent is placed on the
antepenultimate mora as predicted. Epenthetic vowels are marked by italics.
(22) (a) otarie |ota i| ‘sea-lion’ /o tari/
(b) mâchicoulis |maʃ ikuli| ‘machicolation’ /masi kuri/
(c) alerte |alε t| ‘alert’ /are ruto/
(d) travesti |t avε sti| ‘travesty’ /torabe suti/
Notice in examples (c) and (d) above that the epenthetic vowels after the an-
tepenultimate mora are counted for foot construction, otherwise the accent
would fall on a syllable closer to the beginning of the forms as in */a reruto/
and */tora besuti/.
Now we observe the case where the accent would fall on an epenthetic vowel.
When the antepenultimate mora of an adapted form does not correspond to a
French vowel in the input, the accent shifts, as shown in (23).
(23) (a) stylo |stilo| ‘pen’ /suti ro/
(b) patronat |pat ona| ‘patronate’ /patoro na/
(c) abricot |ab iko| ‘apricot’ /aburi ko/
(d) cercle |sε kl| ‘circle’ /se rukuru/
Across languages epenthetic or ‘weak’ vowels tend to avoid the accent. Here
are a few examples from the literature:
The tendency to avoid accent on weak vowels clearly looks like the effect of
a markedness constraint. But in order to distinguish an epenthetic vowel from
a phonetically identical vowel that has an input correspondent, the constraint
must evidently refer to the input. I shall not discuss this point here and simply
adopt the constraint in (24) from Alderete (1995) that expresses this type of
input–output faithfulness relationship.
In our case Head(Foot)-Dep is high ranking and it will force the accent
to avoid an epenthetic vowel as illustrated in tableau 9.
Tableau 9
abricot /aburi’ko/
1. → aburi ko *
2. abu riko *!
and Smolensky 1993). In my data, however, there is not any syllable quantity
change. What we find instead is a variation in accent location.9 According to the
analysis in Shinohara (1997a,b, 2000), we obtain the following foot grouping
in the winning candidates: (H )L and H(L ) (or (H L) and H(L ) depending
on the particular analysis).
The sequence HL is difficult for a bimoraic foot parsing. If the bimoraic
structure is the one preferred by UG for trochaic accent, this difficulty
should be reflected in some way or another. In the default accentuation of
Japanese (identified in French word adaptations), it is reflected by variability in
grouping.
The variation in this particular case is explained as follows. The choice depends
on whether the vowel is recognised by the speaker as a lengthening vowel or
not. If it is, then the vowel is lengthened; otherwise, the regular consonant
gemination in (a) applies.
(27) Align-R (Stem, R, Syllable, R): The right edge of the stem in the
input must be aligned with the right edge of the syllable in the output.
(Tsuchida to appear: 17 (34))12
The lengthening of consonants observed in (a) and (c) in section 6.1 above is
explained by this alignment: by geminating the final consonant of the input
|nap| it becomes a coda, thus it satisfies constraint (27). The second half of the
geminate occupies the onset of the epenthetic syllable to satisfy the Onset
constraint. This analysis explains nicely why the gemination does not appear
for the vowel-ending words (cf. (28a) below)13 nor for word-medial consonants
(cf. 28b and 28c).
Universal Grammar in foreign word adaptations 311
1. → rak.ku *
2. ra.ku *!
Tableau 11
Dep-µ Align-R (Sino-Japanese)
1. → betu *
2. bet.tu *!
The native stratum, on the other hand, does not provide any evidence for rank-
ing these two constraints given that input noun stems do not end in a consonant.
According to standard assumptions, in the initial state of UG Markedness is
ranked over Faithfulness. In the Sino-Japanese grammar we observe Faithful-
ness (Dep-µ) over Markedness (Align-R). Therefore, a re-ranking from the
initial state must have taken place in the Japanese grammar through positive
evidence provided by the Sino-Japanese lexicon. However, in adaptation of
loanwords M ranks over F, as in the initial state. The question thus arises as
to why the adaptation grammar (M F) involves a different ranking from the
Sino-Japanese grammar (F M) and furthermore one that goes in the opposite
direction from the learning algorithm. Let us consider this problem together
with the grammar structure of lexical strata. One possible interpretation is that
F M is specific to the Sino-Japanese stratum while in other strata lacking
positive evidence for re-ranking M F remains. There are other features such
as size restrictions that identify morphemes as belonging to the Sino-Japanese
sector. Adaptations from French and English typically lack these features and
hence are not assigned to the Sino-Japanese class. If we make this assumption
then we can say that the M F ranking of Align-R Dep-µ inherited
from the initial state remains latent in the mature grammar and emerges when
given the appropriate inputs. However, under the core–periphery view of lex-
ical strata proposed by Itô and Mester (1995a,b) the Sino-Japanese grammar
stands between the native Yamato and the adaptation grammar. Furthermore,
Itô and Mester (1995b) claim that strata differ simply by the relative ranking of
Faithfulness and Markedness with more F Ms in peripheral sectors. On this
view then the passage from Sino-Japanese grammar to the adaptation grammar
would involve contradictory re-rankings: initial/native M F → Sino-Japanese
F M → adaptation M F. We must leave this problem for a future investi-
gation. It will be interesting to see if such stem-final gemination that we see in
adaptation arises in primary language acquisition.
We can block the incorrect /ro.zu/ and allow the correct /roo.zu/ to win by
invoking a special Faithfulness constraint for moraic weight to the sympathetic
(indicated by in tableau 13) but failed candidate (see (31)).
(31) -Faith-µµ: Every mora of the flowered candidate must have a cor-
respondent in the output.
If this constraint ranks above Dep-µ, it will block /ro.zu/ because the latter
lacks the crucial extra mora which /roz.zu/ has.
Tableau 13
roz# -Faith-µ Dep-µ
1. roz.zu *
2. → roo.zu *
3. ro.zu *!
314 Shigeko Shinohara
Tableau 15
Sub-grammar B: Align-R *DD
emerged in the adaptation process in spite of the fact that the reversed order
(F M) is active in the Sino-Japanese stratum. A relevant issue worthy of
further study is the ‘core and periphery’ structure of stratified lexicon (Itô and
Mester 1995a,b).
Two general issues for further research arise from our consideration of these
UG-emergent patterns. One issue is whether similar adaptations emerge in
other languages. This is predicted to be the case if they reflect constraint rank-
ings/parameter settings that are inherited from the initial state of UG and remain
latent in the mature grammar. Another issue is the extent to which these UG
reflecting patterns also emerge in second language acquisition, which is also
predicted to occur on the grounds that aspects of the initial state remain in the
mature L1 grammar.
notes
* I wish to thank René Kager, Wim Zonneveld, the audience at the Utrecht Workshop
June 1998 and participants in the Phonology Circle in November 1998 at MIT for
useful comments. Special thanks are due to Michael Kenstowicz for helpful discus-
sions on the topics discussed in this chapter during my visit to MIT. I thank Ruben
van de Vijver and Joe Pater for carefully reading the manuscript and making valuable
comments and suggestions. But I am solely responsible for any errors. My son was
born during the revision of this chapter, which encouraged me to continue my studies
of phonological acquisition.
1. Hereafter, the input phonemic sequences are presented in vertical lines | |; the adapted
forms in Japanese phonemic sequences are indicated by / /; [] indicates any phonetic
output. Pitch accents are marked by an apostrophe after the accented mora.
2. Note that the Japanese /u/ phoneme is realised as a relatively centralised high vowel.
One may suspect that assibilation represents a process of assimilation between [t/d]
and high/frontness of the following vowel. I must, however, leave the nature of this
phenomenon for a topic of future investigation.
3. Assibilation is violated in relatively new loanwords as well as in my data from adapta-
tion: input sequences |t/d| followed by a high vowel are adapted without assibilation:
/tudei/ < Today (part of the title of a column in a newspaper), /dubai/ < Dubai
(loanword); /tuzjuuru/ < |tuuR| toujours ‘always’, /duuzu/ < |duz| douze ‘twelve’
(French word adaptations). Here Ident-[A0 ] is promoted above *TU. When the
vowel is epenthetic, in both the relatively old and the recent loan classes the vowel is
lowered to /o/ from the regular epenthetic vowel /u/ found in the other environments
(in adaptation, it fluctuates between /o/ and /u/), but this time lowering occurs after
both /t/ and /d/ (in the examples below, epenthetic vowels are marked by italics).
(a) /katoriinu donuubu/ < |katRin də nœv| Catherine Deneuve ‘personal name’, loan-
word from French
(b) /batto/ < |bat| bat, loanword from English
(c) /esutoraddo/ < |estRad| estrade, ‘stage’, adaptation (with variation /esuturaddu/).
In epenthetic vowels, vowel features are absent in the input; consequently Ident-
[high] does not determine the winner. Between candidates 2 and 3 in the tableau
Universal Grammar in foreign word adaptations 317
below, candidate 2 is the winner with respect to *TS, unlike the case of /tsuuruuzu/
< Toulouse with a lexical /u/ that violates *TS and respects higher ranked Ident-
[high].
bat/batto/
1. battu *!
2. → batto
3. battsu *!
4. The Japanese writing system itself does not indicate the pitch accent. My informants
for the English data were able to identify accent locations and also knew the notation.
5. Established loanwords from English do not always carry accent on the syllable cor-
responding to the original accent position. Such words are often accented according
to the default accentuation that we shall discuss shortly: /hariu ddo/ < Hóllywood
(McCawley 1968, Haraguchi 1991, Suzuki 1995, Katayama 1995, 1997). Some oth-
ers are unaccented: /kariforunia/ < Califórnia. Non-uniformity of accent patterns
in loanwords may be due to dialectal variability of adapters and to the fact that they
can be accented on the basis of written forms by a speaker who has no access to the
source language.
6. Some of the four-mora forms of prosodically derived words, namely the abbreviated
compounds and words in the reverse language, can be analysed as comprising two
feet: (waa)(puro) ‘word processor’, (hi:)(ko:) ‘coffee’. These four-mora forms are
systematically unaccented. I therefore suspect a relationship between the two-foot
structure and the lack of accent. The presence of only one foot might be a condition
for the accent, but I leave this as a topic for future investigation.
7. In compound words the accent can be aligned with a morpheme boundary: kami
kaze < ka mi ‘divine’ + kaze ‘wind’. Consequently, in Sino-Japanese compounds
where a morpheme can end in a consonant (Itô 1986) an epenthetic vowel can
receive the boundary accent: gaku’moN < gak+moN ‘study’. I think in this case
the alignment constraint overrides the constraint barring the accent on an epenthetic
vowel, which will be discussed shortly. This is consistent with multi-morphemic
adapted forms: /suburenu’te/ < souveraine-té |suvRente| ‘sovereignty’.
8. There are loanwords accented on an epenthetic vowel: kurisu masu < Christmas.
When loanwords are written in Japanese, there is no indication of epenthetic status
of vowels. Loanwords can be accented on the basis of the written form by a speaker
who has no access to the source sounds.
9. In Japanese native proper names where the default accentuation is detectable,
an HLH sequence mostly receives initial accent (e.g., /ka Nziroo/, /ka NemoN/),
whereas HLL seems to be uniformly unaccented. However, the data are too scanty
to conclude that L1 default patterns on the HLH shape are distinct from the result
in the adaptation data.
10. This implies that the unaccented class in Japanese must be considered ‘marked’:
given the choice, it is more optimal for a word to have an accent than to lack one.
318 Shigeko Shinohara
11. A somewhat parallel case is observed for word-final obstruent-liquid clusters, while
other word-final clusters do not trigger any lengthening. We shall not discuss cases
with word-final clusters in this chapter; see Tsuchida (to appear) among others for
similar phenomena in loanwords from English, and Shinohara (1996, 1997b) for
French adaptations.
12. As is stated, this constraint forces an alignment between a stem in the input and a
syllable edge in the output, which is at odds with the standard OT assumption that
the output does not refer to the input properties. However, we shall not be concerned
with this problem here and we shall use Align-R to mean the alignment between
the stem and the syllable on the output level.
13. Exception: when the vowel is a ‘lengthening’ one as mentioned in section 6.1, a
vowel can optionally be long in any position.
14. Sino-Japanese morphemes constitute their own stratum: compared to the native
lexicon, Sino-Japanese morphemes allow freer distribution of segments but their
size is limited to two moras.
References
Alderete, J. (1995). Faithfulness to prosodic heads. MS., University of Massachusetts,
Amherst.
Anttila, A. (1997). Deriving variation from grammar. In F. Hinskens, R. van Hout, and
L. Wetzels (eds.) Variation, Change and Phonological Theory. Amsterdam: John
Benjamins. 35–68.
Brame, M. (1973). On stress assignment in two Arabic dialects. In S. Anderson and
P. Kiparsky (eds.) A Festschrift for Morris Halle. New York: Holt. 14–25.
Broselow, E. and H.-B. Park (1995). Mora conservation in second language prosody. In
J. Archibald (ed.) Phonological Acquisition and Phonological Theory. Hillsdale:
Erlbaum. 151–168.
Broselow, E., S.-I. Chen, and C. Wang (1998). The Emergence of the Unmarked in second
language phonology. Studies in Second Language Acquisition 20. 261–280.
Davidson, L., P. Jusczyk, and P. Smolensky (this volume). The initial and final states:
theoretical implications and experimental explorations of Richness of the Base.
Epstein, S. D., S. Flynn, and G. Martohardjono (1996). Second language acquisition:
theoretical and experimental issues in contemporary research. Behavioral and Brain
Science 19. 677–758.
Fukazawa, H. (1997). Multiple Input-Output Faithfulness relations in Japanese.
[ROA 260, http://roa.rutgers.edu].
Gnanadesikan, A. (this volume). Markedness and faithfulness constraints in child
phonology.
Hagstrom, P. (1997). Contextual metrical invisibility in Mohawk and Passamaquoddy.
In B. Bruening, Y. Kang, and M. McGinnis (eds.) PF: Papers at the Interface, MIT
Working Papers in Linguistics 30. 113–182.
Haraguchi, S. (1991). A Theory of Stress and Accent. Dordrecht: Foris Publication.
Hayes, B. (this volume). Phonological acquisition in Optimality Theory: the early stages.
Hirozane, Y. (1992). Perception by Japanese speakers of some English sounds as the
Japanese choked sound /Q/. The Bulletin of the Phonetic Society of Japan 201.
15–19.
Universal Grammar in foreign word adaptations 319
Paradis, C. and C. Lebel, (1994). Constraints from segmental parameter settings in loan-
words: core and periphery in Quebec French. Toronto Working Papers in Linguistics
13: 1. 75–94.
Poser, W. (1990). Evidence for foot structure in Japanese. Lg 66. 78–105.
Prince, A. (1975). The Phonology and Morphology of Tiberian Hebrew. Ph.D. disserta-
tion, Cambridge, Mass.: MIT.
Prince, A. and P. Smolensky (1993). Optimality Theory: constraint interaction in gen-
erative grammar. MS., Rutgers University.
Prince, A. and B. Tesar (this volume). Learning phonotactic distributions.
Reynolds, B. (1994). Variation and Phonological Theory. Ph.D. dissertation, Philadel-
phia, Penn.: University of Pennsylvania.
Shinohara, S. (1996). The roles of the syllable and the mora in Japanese adaptations of
French words. Cahiers de Linguistique – Asie Orientale 25: 1. 87–112.
(1997a). Default accentuation and foot structure in Japanese: analysis of Japanese
adaptation of French words. In B. Bruening, Y. Kang, and M. McGinnis (eds.) PF:
Papers at the Interface MIT Working Papers in Linguistics 30. 263–290.
(1997b). Analyse phonologique de l’adaptation japonaise de mots étrangers. Ph.D.
dissertation, Université Paris III. [ROA 243, http://roa.rutgers.edu]
(2000). Default accentuation and foot structure in Japanese: evidence from Japanese
adaptations of French words. Journal of East Asian Linguistics 9: 1. 55–96.
(2002). Metrical constraint and word identity in Japanese compound words. In
A. Csirmaz, Z. Li, A. Nevius, D. Vaysmann and M. Wagner (eds.). Phonological
Answers (and their Corresponding Questions). MIT Working Papers in Linguistics.
42. 311–325.
Silverman, D. (1992). Multiple scansions in loanword phonology: evidence from Can-
tonese. Phonology 9. 289–328.
Steriade, D. (1993). Closure, release and nasal contours. In M. Hoffman and R. Krakow
(eds.) Nasals, Nasalization and the Velum. San Diego: Academic Press. 401–470.
Suzuki, H. (1995). Minimal words in Japanese. Proceedings of CLS 31. 448–463.
Tateishi, K. (1991). Les implications théoriques du langage des musiciens japonais.
Langages 101. 51–72.
Teoh, B. S. (1987). Geminates and inalterability in Malay. Studies in the Linguistics
Science 17: 2. 125–136.
Tsuchida, A. (to appear). English loans in Japanese: constraints in loanword phonology.
MS., Cornell University, Ithaca.
Walter, H. (1988). Le Français dans tous les sens. Paris: Robert Lafont.
10 The initial and final states: theoretical
implications and experimental explorations of
Richness of the Base
In this chapter we present the initial stages of work that attempts to assess
the ‘psychological reality’ of one of the more subtle grammatical principles of
Optimality Theory (OT; Prince and Smolensky 1993), Richness of the Base.
Within the OT competence theory, we develop several of this principle’s empir-
ical predictions concerning the grammar’s final state (section 1) and initial state
(section 2). We also formulate linking hypotheses which allow these predictions
concerning competence to yield predictions addressing performance. We then
report and discuss the results of experimental work testing these performance
predictions with respect to linguistic processing in infants (section 3) and adults
(section 4).
1. Introduction
Optimality Theory (henceforth OT) is a highly output-oriented grammatical
theory. The strongest hypothesis is that all systematic, language-particular pat-
terns are the result of output constraints – that there is no other locus from which
such patterns can derive. In particular, the input is not such a locus. Thus, for
example, the fact that English words never begin with the velar nasal ŋ cannot
derive from a restriction on the English lexicon barring ŋ-initial morphemes.
Rather, it must be the case that the English grammar forces all its outputs to
obey the prohibition on initial ŋ. This requirement amounts to a counterfactual:
even if there were an ŋ-initial lexical entry in English, providing an ŋ-initial
input, the corresponding output of the English grammar would not be ŋ-initial.
Thus, the absence of ŋ-initial words in English must be explained within OT by
a grammar – a ranking of constraints – with the property that no matter what the
input, the output of the grammar will not be ŋ-initial. That is, the OT analysis
of English must consider a set of inputs to the grammar – the base – that is
as rich as possible: the base includes all universally possible inputs, including
those that are ŋ-initial, those that contain clicks, those that consist of seventeen
consecutive consonants, etc.
This principle – Richness of the Base – means that it is not sufficient that
the ranking constituting the English grammar derive the correct outputs from
321
322 Lisa Davidson, Peter Jusczyk, and Paul Smolensky
As pointed out in Hayes (this volume) and Prince and Tesar (this volume),
the pressure that Richness of the Base imposes for Markedness » Faithfulness
is not limited to the initial state. Through interaction with other constraints, re-
ranking that occurs during learning can disturb the initial ranking M » F even
in a language lacking marked structures, where this ranking must prevail in the
end. These authors propose various mechanisms by which the pressure towards
Markedness » Faithfulness can be continually imposed by a learning algorithm,
reinstating this aspect of the initial state, except where this pressure is overcome
by positive evidence that F » M – evidence provided by the presence of marked
structures in the target language.
Opposing the line of argument we have just summarised is another view.
From the perspective of a developmental psychologist, one might have had an
entirely different expectation: that, at the outset, faithfulness constraints should
dominate markedness constraints (see also Hale and Reiss 1997). According
to this alternative point of view, the simplest assumption that an infant might
make is that surface forms faithfully mirror their underlying structure. Only
in the face of evidence to the contrary would the learner assume that there is
a more complicated relationship between surface forms and their underlying
representations. The assumption that surface forms faithfully mirror underlying
forms amounts to the assumption that faithfulness constraints outrank marked-
ness constraints; this would then be the initial state as infants begin to acquire
a native language. In this alternative theory, inaccuracies in the earliest produc-
tions of words would be ascribed not to grammar, but to difficulties that infants
have in co-ordinating the actions of the articulators.
As a new source of evidence complementing these theoretical arguments
concerning the character of the initial state, we have conducted experimental
studies to evaluate the relative ranking of Markedness and Faithfulness in the
earliest grammars.
On this analysis, the adult English ranking is MNP » FNP . There is evidence in
the language for this ranking from alternations, but these are highly limited, ap-
plying to in- ‘not’ (before labial stops only) and perhaps also to con- ‘together’
(contemporary, compatriot, congruence). It seems reasonable to assume that
children cannot exploit this evidence during their first two years, the period
of time we examine in the experiments discussed below. So for all intents and
purposes, for infants, English nasal assimilation presents the difficult case of
an unmarked inventory without alternations – the case for which learnability
requires that the initial ranking be Markedness » Faithfulness. Furthermore, at
the age of the youngest children we tested (4.5 months), the literature shows no
evidence of the acquisition of language-particular phonotactics (Jusczyk et al.
1993, Jusczyk et al. 1994, Mattys and Jusczyk in press). Thus it is unlikely
that any behaviour we see in these infants is the result of statistical analysis
of English consonant clusters. Any evidence we find that these youngest in-
fants rank Markedness above Faithfulness would seem to support the nativist
learnability argument that this is knowledge encoded in the initial state.
Thus we ask the experimental question: do infants show evidence of the
ranking Markedness » Faithfulness? Focusing on nasal place assimilation, our
goal then is to determine whether the youngest English learners give evidence
of observing each type of constraint, and if so, to determine how they rank these
two types of constraints with respect to one another.
X . . . Y . . . XY
where XY is a concatenation of X and Y in which some sound
changes may have possibly occurred (e.g., on . . . pa . . . ompa).
Such a triad ‘conforms to a grammar’ if there is a choice of inputs
/x/ and /y/ such that X, Y, and XY are respectively the output of
the grammar for the three inputs /x/, /y/, and /xy/, in which /xy/
is literally the concatenation of x and y (possibly separated by
a morphological boundary). (Thus, depending on the ranking of
Faithfulness in the grammar, the conforming XY may be a faithful
concatenation of X and Y, or an unfaithful concatenation reflecting
phonology.)
Relative version: One set of such stimuli A ‘conforms better’ to
a grammar than another set B if A has higher Harmony accord-
ing to the grammar. The inputs /x/ and /y/ are presumed chosen
to maximize Harmony. (Thus on . . . pa . . . ompa conforms better
to a grammar than does on . . . pa . . . onpa if the grammar assigns
higher Harmony to the mapping /onpa/ → ompa than it does to
the input–output mapping /onpa/ → onpa.)
Hypothesis (2b) suffices for evaluation of the hypothesis (1a) that markedness
constraints are evidenced in the initial state, because markedness constraints
involve only outputs. The more complex hypothesis (2c) is necessary, however,
for assessing Faithfulness and its ranking relative to Markedness, since inputs
as well as outputs of the grammar must be involved. Use of triad stimuli of the
form described in (2c) will be referred to as the X/Y/XY paradigm.
The work reported in this section shows the initial results of a general research
programme with a wide range of potential applications within grammatical the-
ory. If the proposed Linking Hypotheses (2) – including the X/Y/XY paradigm –
bears up under experimental investigation, listening times can be used to inves-
tigate many aspects of the grammars of infants. The initial work reported here
involves two dimensions of phonological markedness: syllable structure and
nasal place assimilation. The corresponding markedness constraints are respec-
tively Onset/NoCoda and MNasalPl , a constraint (introduced in section 2.2)
requiring that the place of articulation of a nasal consonant agree with that of a
following consonant.
With respect to syllable structure, we examine a prediction of the Basic
C/V Syllable Theory (Prince and Smolensky 1993: ch. 6): a single intervocalic
consonant will universally be parsed into the onset of the following syllable
rather than the coda of the preceding syllable. That is, . . . VCV . . . will always
be parsed . . . V.CV . . . rather than . . . VC.V . . . , where the period marks the
syllable boundary. Thus, all else equal, stimuli with the syllable structure . . .
V.CV . . . (such as ba.di to.ma . . .) will conform better to any grammar than those
Implications and explorations of Richness of the Base 331
with the structure . . . VC.V . . . (bad.i tom.a . . .). The reason is as follows. These
forms may well violate many markedness constraints – e.g., toma violates a
constraint prohibiting nasals – but the two matched lists will fare equally well
with respect to all constraints except those sensitive to the only difference
between the lists, the location of the syllable boundary. And here, both the
constraints Onset (‘syllables have onsets’) and NoCoda (‘syllables do not
have codas’) are violated by all the forms in the second list and satisfied by
all those in the first. Regardless of where these markedness constraints may
be ranked in the infant’s current grammar, this entails that the first list is more
harmonic than the second, and by hypothesis (2b), infants are thus predicted to
attend longer to the first list – if Onset or NoCoda is indeed present in their
grammars.
The case of syllable structure was chosen for the initial study in part because
the competing alternatives – different syllabifications (e.g., to.ma vs tom.a) –
are equally faithful to any underlying form.2 There can therefore be no question
of conflict between the markedness constraints Onset/NoCoda and faith-
fulness constraints. This allows for a relatively simple experiment, but does
not allow us to test the hypothesis concerning the relative ranking in the initial
state of Markedness and Faithfulness. For this purpose, we turned to a different
markedness constraint, MNasalPl : satisfying this constraint can require chang-
ing the place of articulation of a consonant, violating Faithfulness and thereby
setting up a conflict of the desired type.
The experimental study of MNasalPl tests all three hypotheses in (1). First,
to test the hypothesis that Faithfulness is present in the infants’ grammars and
relevant to their behaviour in the way hypothesised in (2c), we compare a list
of items of the form om . . . pa . . . ompa in . . . du . . . indu . . . with a matched
list of items of the form om . . . pa . . . indu in . . . du . . . ompa. . . .3 The lists are
designed to fare equally well on all constraints except Faithfulness, which is
satisfied in the first list and violated in the second. In particular, note that the
lists are constructed so that the XY forms do not violate MNasalPl : the final
nasal in X always agrees in place with the initial stop in Y. Thus if Faithfulness
is present in the infants’ grammars and relevant to their behaviour in the way
hypothesised in (2c), infants should attend longer to lists of the first – faithful –
type.
Next, to test whether MNP is in fact present in infants’ grammars, we compare
lists such as ompa . . . indu . . . with matched lists omdu . . . inpa. . . . The first
list satisfies MNP while the second list violates it, and the matching of syllables
employed in the two lists means that the two lists fare equally with respect to
other constraints. Thus if MNP is present in the initial state then, according to
(2b), we predict infants will attend longer to the first type of list.
For direct comparison with the Faithfulness experiment, it is actually prefer-
able to design the Markedness experiment using lists of triads comparable to
332 Lisa Davidson, Peter Jusczyk, and Paul Smolensky
those used for studying Faithfulness. The first type of list includes triads of the
form om . . . pa . . . ompa in . . . du . . . indu . . . while the second type of list is a
matched list of items of the form in . . . pa . . . inpa om . . . du . . . omdu. . . . Both
lists satisfy Faithfulness because each ‘XY’ bisyllable is a faithful concatena-
tion of the ‘X’ and ‘Y’ monosyllables. But the consonants coming into contact
in XY were selected so that nasal place assimilation was satisfied in the first list
and violated in the second. Again, we predict from (2c) that infants will attend
longer to the first type of list. This is the paradigm actually employed in the
experiments described below.
Finally, to test whether Markedness » Faithfulness, we must pit FNP against
MNP. Now the first list consists of triads of the form on . . . pa . . . ompa while
the second list contains matched items on . . . pa . . . onpa. The first list violates
FNP but satisfies MNP , and the reverse is true of the second list. The prediction
from the Richness of the Base argument in section 2 is that infants will attend
longer to lists of the first type.
Clearly, if the overall research programme enabled by the linking hypothesis
(2) is viable, experiments of this general sort can be used to address virtually
any putative markedness constraint. The approach may even extend to syntax
in older children.
satisfied Faithfulness (‘TF’) while the lower Harmony stimuli did not (‘*F’).
In the ‘Markedness vs. Faithfulness’ experiment, the higher Harmony triads
respected Markedness but violated Faithfulness, while the reverse was true of
the lower Harmony triads. In all other respects, the two types of stimuli were
matched in each experiment.
The infants’ listening times to the two types of stimuli presented in each
experiment are shown in the columns headed ‘Higher Harmony’ and ‘Lower
Harmony’. The mean times are shown in boldface, with the standard deviation
(SD) and standard error (SE) shown beneath. The difference in listening times
for each experiment is assessed by several measures in the final column. The
proportion of infants preferring the higher Harmony stimuli are shown as a
fraction with denominator 20 or 16, depending upon whether the data from
20 or 16 infants were used in the experiment. In boldface, the P-value of the
significance of the difference in mean listening times is shown, according to a
paired t-test; t values are shown beneath the P value.
(3)
Results of experiments addressing the initial state
Markedness Age Constraints Higher Harmony Lower Harmony Difference
(mo) Example Example Proportion
Mean Time Mean Time p
SD / SE (sec) SD / SE (sec) t
15 months. In particular, it is at this age that infants show the first signs of
learning about morphology in their comprehension of language. For example,
Shady (1996) found that neither 12- nor 14-month-old English-learning in-
fants showed significant listening preferences for passages with function words
(functor morphemes) in the proper position (natural passages) over passages
with function words in improper positions (unnatural passages). However, by
16 months, English learners are sensitive to the phonetic and distributional
properties of function words and listen longer to the natural passages. Simi-
larly, Santelmann and Jusczyk (1998) explored English learners’ sensitivity to
the relationship between the auxiliary verb is and the verb ending -ing. The -ing
morpheme is one of the earliest acquired morphemes in children’s productions
of English (Brown 1973, De Villiers and De Villiers 1973). Santelmann and
Jusczyk found that sensitivity to the basic dependency between the function
morpheme is and the morpheme -ing develops between 15 and 18 months of
age. Together, these two studies provide evidence that, beginning at approx-
imately 15 months of age, infants are becoming sensitive to the presence of
morphemes in the speech around them.
As L. Benua (personal communication, September 2000) has pointed out to
us, the beginnings of morphology pose difficulties for the child’s analysis of
nasal assimilation in English. Presented with a single-syllable item like im in
a stimulus triad of the experiment, 15-month-old infants may now explore the
new possibility that this is a prefix, combining with the following root po to form
a morphologically complex word impo. Now under this analysis, it is unclear
whether a prefix-final nasal should assimilate to a following consonant: the
evidence available from English is mixed. As we have seen, an ‘inner’, Level 1,
affix like in- ‘not’ will assimilate; an ‘outer,’ Level 2, affix like in- ‘directed
inward’ will not. Until the child’s morphology becomes sophisticated enough to
sort out such subtleties, the evidence will provide conflicting information about
whether it is Markedness or Faithfulness that should dominate here. The result
might well be indefiniteness of ranking of the sort exhibited by the 15-month-
olds in our study.
By 20 months it may be that children are no longer treating the hypothetical
forms in the experiment as novel affixes. Perhaps their receptive lexicons are
now substantial enough that the complete absence of recognisable lexical items
suggests to them that, while the sound patterns they are hearing are highly
English-like, these are not actually English lexical items. Thus perhaps the 20-
month-olds are not attempting lexical and morphological analysis, just as an
adult presumably would not. The forms are being treated as morphologically
simple items, subject to the basic patterns of English phonology, which, as
we have seen, demand that nasals assimilate. The constraints that are being
applied to the experimental stimuli are the most basic ones, where nasal place
Markedness dominates Faithfulness.
336 Lisa Davidson, Peter Jusczyk, and Paul Smolensky
3.4 Extensions
We have discussed initial work in an experimental research programme that
attempts rather directly to observe the presence and ranking of grammatical
constraints in infants too young to be producing any linguistic utterances. If
this research programme proves successful, it may become possible for such
experimental observation directly to inform theory development, even with
respect to some of the most theoretically intricate issues in linguistic theory.
Take nasal place assimilation, for example. It can be exploited to examine a
deep theoretical question concerning OT: must the type of optimisation used in
OT to date be modified to allow constraints to be highly selective in the forms
that they compare? In recent work, Wilson (2000) proposes such a fundamental
modification of the theory in order to address what he argues to be a funda-
mental problem in standard OT. While the problem is much more general, it is
nicely illustrated by nasal place assimilation. (Here we apply Wilson’s general
approach to the particular case of nasal assimilation; this is not a case Wilson
treats explicitly.) It appears to be a strong empirical generalisation that when
a nasal comes in contact with a following consonant with a different place of
articulation, if place assimilation occurs, it must be the nasal that assimilates.
Within standard OT, an output constraint like MNasalPl that requires a nasal
consonant N to agree in place with a following consonant C can be satisfied by
changing the place of C, and, as shown in a number of such cases in Wilson
(2000), this entails that in the typology predicted by standard OT, there are lan-
guages in which it is C not N that assimilates under certain general conditions.
In Wilson’s proposed modification of OT, the markedness constraint MNasalPl
is replaced with a different type of constraint which we shall call → MNasalPl .
Now MNasalPl asserts that a form in which a nasal fails to agree in place with a
following consonant is less harmonic than any form lacking such an agreement
failure. But → MNasalPl is a targeted constraint: it asserts only that a form S in
which a nasal fails to agree in place with a following consonant is less harmonic
than the single form S which is exactly like S except that the nasal’s place has
changed to agree with that of the following consonant.
Targeted constraints solve this and many other important problems for stan-
dard OT. But the formal definition of optimisation required in the new theory –
called TCOT (‘TC’ for ‘targeted constraints’) – must be made significantly
more complex in order to deal with the new possibility that constraints can
be targeted, because such constraints refuse to assert a Harmony relationship
between all but a small set of closely related pairs of forms (like S and S ).
A new source of evidence that a move from OT to the more formally com-
plex TCOT is empirically warranted might be provided by experiments of the
sort discussed above. According to standard OT, triads like on . . . pa . . . onda
should satisfy MNasalPl and thus be preferred to their faithful counterparts
Implications and explorations of Richness of the Base 337
(8) M1 ›› M2 ›› F ›› M3
This means that the Chinese borrowings will be ‘repaired’ to satisfy M1 , but
violations of M2 will now be tolerated. Such is actually the case in Sino-Japanese
forms with the constraints in (5).
We propose to explore what we believe to be a new interpretation of this
state of affairs: at the time when the Chinese forms were being borrowed, the
base adult Japanese grammar actually was (4). The ranking M1 » M2 was a
hidden ranking: there was no evidence for it in the Japanese lexicon at that
time. The forms violating M1 and those violating M2 define two hidden strata
of Japanese, a covert distinction among Japanese-illegal forms.
Suppose furthermore that at the time of borrowing, the ranking of F was
somewhat variable: it was a floating constraint (Reynolds 1994, Zubritskaya
1994, 1997, Nagy and Reynolds 1997, Anttila 1998, Boersma 1998, Boersma
and Hayes 1999, Legendre et al. 2000):
(10) M1 » F » M2 » M3
When F occupies the lowest ranking in its range, we shall say it is in its base
position, and the resulting grammar we shall call the base ranking (4). In both
of the total rankings generated by the floating constraint – (4) and (9) – M1 will
be respected in all outputs; but M2 will be respected only by the outputs of the
base grammar. Suppose we assume, as a mutual constraint between the lexicon
and the grammar, that a native vocabulary item must be assigned a consistent
pronunciation by the grammar: that is, the output must be the same for both
rankings (4) and (9). The native forms, then, must all obey M2 . The result is
that, considering only these native lexical forms, there is now no evidence for
the true full ranking (9) – only for the partial ranking (7).
But the non-native inputs provided by Chinese allow us to see the hidden
rankings in the final state. Suppose that, perhaps with the commitment of
340 Lisa Davidson, Peter Jusczyk, and Paul Smolensky
(e) Thus for a native input, the output of the grammar is non-variable,
and therefore equal to the single output of the base grammar (with
faithfulness constraints at the bottom of their floating range). For
native inputs, only the base grammar need be considered; floating
can be ignored.
(f) Inputs that are not drawn from the native lexicon will in general
yield variable outputs; with Faithfulness elevated from the base
grammar, outputs of non-native inputs will in general violate the
surface generalisations of the language.
(g) It follows that the base grammar satisfies standard Richness of the
Base: it generates only outputs respecting the surface generalisa-
tions of the language, given any input whatsoever.
An argument for the crucial final point (11g) goes as follows. Suppose there
were some possible input I which the base grammar mapped to an output
O that violated a surface generalisation. This would arise because the rel-
evant markedness constraints were outranked by relevant faithfulness con-
straints in their base position. But then I would also yield O with Faith-
fulness elevated from its base position. Thus I would yield O consistently
across the full grammar with floating constraints, so I would be a po-
tential underlying form. Since there are no other constraints on underly-
ing forms, there is nothing to bar such an I from the lexicon, thereby de-
stroying the observed surface generalisation.5 Therefore such an I cannot
exist.
An important feature of Extended Richness of the Base (11) is that it sub-
sumes previous work in OT under standard Richness of the Base: this work
is now taken to be the study of base grammars, which are determined by the
inventory of native forms. The consequences of floating Faithfulness only arise
when analysing the partial nativisation evident in actual outputs from non-
native inputs, or the residue of such partial nativisation in borrowing-induced
lexical strata. It is precisely for such analysis that floating Faithfulness has been
previously invoked.
Knowing the articulated depiction of the final ranking provided in the covert
form displayed in (10) – rather than the less-informative base form illustrated
in (7) – is important not just for precisely accounting for the partial nativisation
of foreign words, but also for second language acquisition. It is commonly
assumed in the L2 literature that for adults learning a second language, the
initial state for L2 acquisition is the final state of L1 (Broselow and Finer 1991,
Archibald 1993, Broselow and Park 1995, Pater 1997, Broselow, Chen, and
Wang 1998, Hancin-Bhatt and Bhatt 1998). But the unarticulated base form
of the grammar of L1 is generally silent on distinctions that are important in
342 Lisa Davidson, Peter Jusczyk, and Paul Smolensky
L2. For inputs from the L1 vocabulary, the base form suffices to determine the
optimal output, but for the foreign inputs of L2, other rankings can be crucial.
And in order to account for early L2 productions and the influence of L1 on the
initial course of L2 acquisition, the ability of L1 speakers to elevate faithfulness
constraints in the face of foreign inputs needs to be precisely specified, as it is
in the covert form of the ranking.
Because ultimate application to L2 was a primary motivation for the work re-
ported below, the phonological domain we chose to explore in our experimental
work was one that has received considerable prior attention in the L2 literature:
consonant clusters. Previous work has focused on cases in which L2 is English,
and L1 is a language with an inventory of clusters that is impoverished relative
to English (Broselow 1983, Anderson 1987, Tropf 1987, Broselow and Finer
1991, Eckman and Iverson 1993, Archibald 1998, Carlisle 1998, Hancin-Bhatt
and Bhatt 1998). Due to constraints on the availability of experimental par-
ticipants, we chose English for L1, and tested the production of clusters that
are illegal in English. As a convenient way to get a large number of fluently
produced English-illegal clusters, we used a Polish speaker to generate the ex-
perimental stimuli. While our results have implications for English-speaking
learners of Polish, that is not the topic of this research. Rather, our goal is
to examine the consonant-cluster component of the final state of the English
grammar.
The target item – vzety in the diagram (13) – was presented in both aural
and written form; the orthography was English-like (not IPA or Polish). The
written forms were intended to decrease the probability of misperception of
the consonants in the cluster; it is likely that the combined aural and visual
344 Lisa Davidson, Peter Jusczyk, and Paul Smolensky
The clusters fall into four rather distinct groups. Performance is above 90 per-
cent on all English-legal clusters, assuming šl to be legal at least in the East Coast
American dialect predominating in our experimental participants. Performance
drops to 63 percent for what we will call the ‘Easy’ non-English clusters, zm and
zr. Performance drops again to 34−39 percent for the ‘Intermediate’ clusters
kt, pt, kp, čk. The cluster tf falls halfway between the Intermediate and Easy
clusters; it is convenient for analysis to group it with the Intermediate clusters,
which can then be characterised as the voiceless-stop-initial clusters. (Obvi-
ously, more extensive and systematic follow-up experiments will be needed for
a more definitive classification of clusters of this type.) Finally, dropping to
11−19 percent correct, we find the ‘Difficult’ clusters vn, vz, and dv.
While the labels ‘Easy/Intermediate/Difficult’ are transparent, it should be
emphasised that the dependent measure in the experiment is accuracy, not
difficulty per se.
Pairwise comparison of performance on individual clusters shows statistical
justification for the post-hoc three-way grouping of the English-illegal clusters.
In (15) we summarise the significance levels of performance differences, taking
p < 0.1 as our criterion. The greyed cells correspond to comparisons of clusters
in the same class; these should not be significantly different, and they are
not. The white cells correspond to comparisons of clusters in different classes;
346 Lisa Davidson, Peter Jusczyk, and Paul Smolensky
these should show significant differences, and they do. The only exceptions are
indicated by the numbers in (15), which give p levels too high to satisfy the
criterion. We have only listed one cluster in the English-legal class, since all
these clusters behave alike statistically.
At the boundary between the Intermediate and Difficult classes, the comparison
kt vs. dv barely fails the criterion of significance, with p < 0.12. The other
exceptions all involve tf ; it fails to be significantly different from both members
of the Easy class (zr, zm); while significantly different from the two members
of the Intermediate class on which performance was worst (pt and kt), it fails
to be significantly different from the other two (although with kp it is close).
(While there may be some indication here that tf is better classified as Easy, we
shall await more significant results before complicating the analysis proposed
below in order to account for such a classification.6 )
As a final, within-speaker means for confirming our grouping of clusters,
we test the following prediction implicit in the grouping: each subject should
produce more correct forms for clusters of the Easy class than for clusters of the
Intermediate class, and similarly more for the Intermediate than for the Difficult
class. In the scatter plots shown in (16), each point represents the performance
of a single speaker. In the first plot, proportion correct on Easy clusters is plotted
on the y-axis, while Intermediate cluster performance is plotted on the x-axis.
All points are predicted to lie above the principal diagonal (y = x), that is, in the
upper left triangle. This prediction is confirmed, with only two speakers failing
to meet it. The significance of this result is p < .002. Exactly the same result
holds in the Intermediate/Difficult comparison shown in the second scatter plot.
Implications and explorations of Richness of the Base 347
1.0 1.0
Proportion Correct: Easy
Proportion Correct:
0.8 0.8
Intermediate
0.6 0.6
0.4 0.4
0.2 0.2
0.0 0.0
0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0
The experimental results seem to suggest that the grammar of English initial
clusters contains ‘hidden strata’ among the illegal forms, each group of clusters
defining a stratum. Inputs with clusters of the Easy stratum are most likely
to produce faithful, English-illegal outputs, with no nativisation. Inputs with
clusters in the Intermediate and Difficult strata are successively less likely to
surface faithfully, that is, more likely to undergo nativisation.
The ‘repairs’ performed when English-illegal inputs failed to surface faith-
fully were overwhelmingly epenthesis, as shown in (17).
0.10 0.07
0.05 0.02 0.02
0.00
0.00
Error Types
4.4 Analysis
We now examine whether phonological theory can shed light on the hidden
strata revealed by the experiment. One hypothesis already proposed in the lit-
erature, which in fact guided the selection of the clusters used, is that sonority
distance should play a primary role. English allows some clusters with a sonor-
ity distance of 1 (e.g., sn) but none with sonority distance 0 (e.g, *pk). This
348 Lisa Davidson, Peter Jusczyk, and Paul Smolensky
hypothesis predicts then that the class of English-illegal clusters with a legal
sonority distance (1 or more) should be produced more accurately than the class
of clusters with the illegal sonority distance 0. As shown in (18), however, this
is not the case.
In (18) the clusters are listed in order of increasing proportion correct; there is lit-
tle correlation between this and the distinction between sonority distances. The
lack of performance difference between the two sonority-distance categories
was evident in a one-way repeated measures Analysis of Variance (ANOVA)
across the mean proportion correct for each category; among English-illegal
clusters, there was no significant effect either by subjects [F 1 (1,15) = 2.65,
p >.12] or by items [F 2 (1,38) = 1.75, p >.19]. (There was, of course, a very
significant effect of the English-legal vs English-illegal factor among the clus-
ters with non-zero sonority distance; p < .0001 by both subjects and items.)
Given that sonority distance appears to distinguish clusters too coarsely, we
turn to a more fine-grained analysis in terms of phonological features, many of
which provide the substrate for sonority conditions on onset well-formedness.
Can the markedness of different feature combinations, suitably formalised,
account for the hidden strata? The table in (19) is suggestive.
What the table shows is that as the clusters get worse, as measured by perfor-
mance in the experiment, they also get more and more marked. In (19), marked
feature values along various dimensions have been highlighted; the basic
markedness constraints involved are given in (20). The markedness of [+voice]
for obstruents is indicated by highlighting this feature in boldface; the marked-
ness of non-coronal place, by bold italics. The markedness of [+continuant] for
obstruents (or perhaps for onsets) is indicated by boldface on F = fricative. And
underlining highlights the markedness of a stop followed by a non-approximant
(Ā): in this environment, the stop is marked because it is unreleased, as indi-
cated in (19) by S . We return to this dimension of markedness in a moment.
(21) Local conjunction enables inventories that ban only the worst of the
worst
*F & *Voi −son Max *F *Voi −son
/blik/
blik *
lik *!
/flik/
flik *
lik *!
/vlik/
vlik *!
lik *
Implications and explorations of Richness of the Base 351
(22) *SĀ A stop must release into an approximant in onsets (no S before
Ā)
*SĀ is related to the following constraint, which prefers CV syllables.
(23) *CC A consonant must release into a vowel in onsets (no C before
C).
The interpretation of this table (24) may be illustrated as follows. The clusters
zr, zm are more marked than legal English clusters because the initial segment is
the locus of three simultaneous constraint violations: *Voi −son , *F, and *CC ;
the three marks ‘’ incurred by the initial z in these clusters entail violation
of the local conjunction *Voi −son & *F & *CC . Note that violating any two
of these constraints in the same segment does not generate sufficient Markedness
to result in exclusion from English. Violating *F and *CC in the same segment is
permitted, as in the cluster fl; violating *Voi −son and *CC is likewise acceptable
(bl). And violation of both *Voi −son and *F occurs in every single voiced
fricative (z).
Note that all clusters violate *CC , which functions like the *Complex of
Prince and Smolensky (1993), except that it explicitly localises its violation to
the initial C of a CC cluster. Thus, to violate a local conjunction C & *CC ,
the violation of C must occur in the initial segment of a CC cluster (or, more
generally, in a C that is not the final segment of a cluster).
To take one more example from table (24): like the Easy clusters, the Difficult
clusters vn, vz also violate the third-order conjunction [*Voi −son & *F & *CC ].
In addition, however, the initial consonant v has marked place, that is, it violates
*Cor. Thus the increased markedness of these Difficult clusters is registered by
their violation of the fourth-order conjunction *Cor and [*Voi −son & *F &
*CC ].
It is clear in (24) that constraint ranking plays an important role in distin-
guishing the hidden strata. The cluster tf , like many English legal clusters,
violates only two of the listed constraints, yet it is in the Intermediate stratum,
while the cluster zr, violating three constraints, falls in the Easy stratum. Un-
like zr, the cluster tf violates *SĀ , a constraint that is unviolated among the
Implications and explorations of Richness of the Base 353
English-legal clusters. The high rank of *SĀ is responsible for its strong im-
pact, single-handedly separating the Easy clusters (which satisfy it) from the
Intermediate clusters, which do not.
The formal definition of each conjunctive constraint in (24) follows automat-
ically from the general definition of constraint conjunction, together with the
substantive definitions of the basic constraints that are conjoined, given above
in (20), (22), and (23). It is useful, however, to provide an English paraphrase
of the effect of these conjunctions; this is given in (25).
(25) Exegesis of local conjunctions (within onset clusters)
(a) *Voi −son & *F & *CC Violated by a voiced
fricative before any C.
(b) *Cor & *Voi −son & *F & *CC Violated by a non-coronal
voiced fricative before
any C.
(c) *Voi −son & *SĀ Violated by a voiced stop
before an obstruent or nasal.
The ranking implicit in the table (24) is made explicit in (26). The floating
range of Dep is indicated, with four positions indicated by the circled digits. The
base position ❶ defines the base grammar of English, which admits only those
clusters appearing in English lexical items. Elevating Dep to the next-higher
position ➁ now adds to the inventory the group of Easy clusters. Elevating Dep
still further, to position ➂ and then to ➃, incrementally adds the Intermediate
and then the Difficult clusters.
(26) English grammar, showing hidden constraint rankings and hidden clus-
ter strata
It is important that the theory correctly accounts for the qualitative ordering of
the strata, regardless of the quantitative assumptions concerning the probability
of Dep floating to each of the four positions. Each position that yields correct
production in one stratum s also yields correct production in the strata listed
above s in (27). Thus, whatever the quantitative value of the probabilities of
visiting different positions, the predicted percent correct will increase up the
table. (This would even be true if the probability were greatest for ➃ and least
for ❶.)
Using the observed percent correct values, it is easy to fit the data with
assumed values for the visitation probabilities, as shown in the portion of the
table in (27) headed ‘Empirically Fit Floating’. Thus the experimental results
suggest that the probability of elevating Dep to different positions decreases
as the degree of elevation increases; most probable is the base position ❶ (33
percent), least probable the highest position ➃ (16 percent).
The values for visitation probabilities in (27) are fit to average accuracy
values over all subjects. The scatter plots in (16) – with a separate point for
each speaker – show considerable variance across speakers. On the analysis we
are proposing, this variance may be attributed to differences across speakers
and trials of the effectiveness of cognitive resources allocated to the eleva-
tion of Faithfulness. The relative rankings of markedness constraints, how-
ever, are what define the hidden strata, and the evidence is consistent with
a common underlying hidden ranking of Markedness across speakers. While
one speaker might be unusually successful at elevating Faithfulness, a shared
hidden Markedness ranking entails that, no matter how good that speaker’s
performance may be on Difficult clusters, that same speaker’s performance on
Intermediate clusters must be better, and on Easy clusters, better still. Averaging
Implications and explorations of Richness of the Base 355
within speakers over the clusters defining each hidden stratum, (16) shows that
this prediction is borne out to a striking degree, holding for all but two of sixteen
speakers.
The experimental results are shown in the graph (28). As the graph shows,
there is a very strong correlation between the performance ordering of clusters
across the three conditions. In fact, the rank correlations between the English and
Repetition conditions is 0.97, that between the English and Foreign conditions
is 0.91, and between the Foreign and Repetition is 0.89 (all p <.001).
There is some evidence for discrete strata in the Foreign Condition, and while
these correlate with those of the English Condition, the boundaries of the In-
termediate stratum seem to shift: čk approaches the Difficult group while the
others approach the Easy group. Evidence for strata in the Repetition Condition
is even murkier, perhaps because a ceiling effect is starting to set in: as perfor-
mance starts to approach 100 percent, there is less variation across the majority
of clusters and so achieving statistically significant differences requires more
experimental power. We leave detailed investigation of such questions to future
experiments which will aim to provide that power.
As far as the manipulation concerning instructions are concerned, there is
little evidence that conscious attempts to ‘sound American’ vs. ‘sound foreign’
affect the relative accuracy of the clusters, although the proportion of faithful
utterances increases for nearly all clusters. The only exceptions are sn, sm,
where speakers displayed a tendency to produce šn, šm.
1.20
1.00
0.80
0.60
0.40
0.20
0.00
vz vn dv chk kt pt kp tf zm zr shl sn shr sm fr
for non-English outputs is not seen as part of the final state per se. A much
more severe weakening of the grammatical perspective would in addition deny
that covert Markedness rankings are responsible for relative accuracy – that
relative accuracy is not a consequence of the grammar which, say, treats all
non-native forms equally. Instead, explanation of relative accuracy might be
sought in the statistics of English, with greater frequency predicting greater ac-
curacy. Of course, in onset position, the frequency of the non-English clusters
is nominally zero, by definition. But one might look to the relative frequency
of hetero-syllabic and coda clusters for explanation.
A frequency count from the CELEX 2 Lexical Database is not encouraging
for the statistical perspective, however.8 As shown in (29), whereas a statistical
explanation would require that accuracy is a monotonically increasing function
of frequency, there is in fact essentially no correlation between the orders of
relative accuracy and relative frequency, whether of tokens or types. The rank
correlation coefficients between frequency and accuracy in the experiment is
only 0.14 (p > 0.3) for tokens, and 0.06 (p > 0.4) for types. The poor rank
correlation between frequency and accuracy is also shown in (30). In (30a),
the clusters are listed from lowest to highest accuracy in the experiment, with
boldface marking the Difficult stratum, bold italics marking the Intermedi-
ate stratum, italics marking the Easy stratum, and regular typeface indicating
English-legal clusters. In (30b−c), the clusters are listed in order of increasing
token and type frequencies.
1000000 10000
Token Frequency
Type Frequency
100000
1000
10000
1000 100
100
10
10
1 1
0.00 0.20 0.40 0.60 0.80 1.00 0.00 0.20 0.40 0.60 0.80 1.00
Proportion Correct (Experiment) Proportion Correct (Experiment)
C1 and C2 . This grammar also allows the initial cluster pl to surface faithfully,
but not the illegal clusters pt (Intermediate stratum) and dv (Difficult stratum);
as in the analysis above, these clusters are broken up by epenthesis. In this
tableau the unfaithful candidates are in italics, and a preceding ‘*’ marks faithful
candidates that are non-optimal; these flag
the clusters banned by this ranking.
Weak vowels in candidates are marked V to distinguish them from other vowels,
marked V́. The sub-hierarchy of markedness constraints *Voi −son & *SĀ »
*SĀ » *CC constitute the hidden rankings proposed above which separate the
hidden strata occupied by the clusters dv, pt, and pl.
Now elevating *σ̆ just above Max-V enables some weak vowels to be
deleted for rapid speech. Such deletion typically creates clusters, and these are
subject to the hierarchy of constraints with the covert rankings in question.
With minimal elevation of *σ̆ , deletion of weak vowels occurs if this creates
an English-legal cluster, but is blocked if an illegal cluster would result. This is
illustrated in tableau (33).
When the ranks of the constraints violated by illegal clusters are infiltrated
by floating *σ̆ , their relative rankings are exposed. Tableau (34) illustrates a
degree of elevation in which *σ̆ is still ranked below the constraints violated
by clusters in the Difficult stratum, but above those violated by clusters of the
Intermediate stratum. Now weak vowels delete when this creates a cluster that is
either English-legal (pl), or within the first two (Easy; Intermediate: pt) hidden
strata of the illegal clusters. Weak vowel deletion is however blocked when an
illegal cluster in the third (Difficult, dv) hidden stratum would result.
362 Lisa Davidson, Peter Jusczyk, and Paul Smolensky
notes
1. This morpheme in- ‘not’ fails to assimilate to other following consonants, however,
as in incompetent, ingratitude, informal, and involuntary.
2. We are assuming here that syllabification is not present in underlying forms. This
is commonly assumed to account for the strong cross-linguistic generalisation that
syllable structure is not lexically contrastive. If syllable structure were present in
underlying forms, the predictions stated in the text would still follow, but now because
of the ranking structure of the initial state: Faithfulness to underlying syllable structure
would be dominated by Markedness. See the discussion of nasal assimilation below
for an experimental approach to testing such a ranking.
364 Lisa Davidson, Peter Jusczyk, and Paul Smolensky
3. There were six lists of each of the two types in each experiment. The X items had the
form VC or CVC, with the initial C an obstruent and the final C one of the nasals n, m,
or n; in each list of eight triads, all X forms had the same nasal. The Y items had the
form CV with C a voiced or unvoiced obstruent and V a vowel occurring word-finally
in English. X, Y, and XY were each naturally produced (by a female native speaker
of English) as a single prosodic word; XY was produced with initial stress and was
not an English word. Triads were separated by 1 sec. and items within each triad were
separated by 0.5 sec. The total duration of each list was approximately 25 sec.
4. In each of experiments 3–5, 16 infants (7 males, 9 females), from monolingual
English-speaking homes, were tested. The average age of the infants for each exper-
iment fell between 4 months, 16 days and 4 months, 20 days, with ages of individual
infants ranging from 4 months, 2 days to 5 months, 8 days. Additional infants
(6 for experiments 3 and 4, 14 for experiment 5) were tested but not included for
reasons of excessive fussiness or crying, failing to orient properly to the test appara-
tus, experimenter error, or parental interference.
5. This abbreviated argument assumes a simplified situation like that discussed above, in
which a single faithfulness constraint is floating. An argument considering the fully
general case, with multiple faithfulness constraints that may change their relative
ranking by floating, is beyond the scope of the present discussion.
6. A natural direction for refining the analysis would be to consider tf to occupy its own
stratum between Easy and Intermediate, and analyse it by extending the hierarchy
proposed below for consonant release: *SĀ » *CC (22)–(23). Introducing a new
constraint in this family, ranked above the other two, *S[−con] : A stop must not release
into a non-continuant, would distinguish tf , which satisfies the new constraint, from
the Intermediate clusters, which violate it. (Although the status of the affricate in čk
is a complexity we systematically ignore here, simply treating it like a stop.)
7. Local conjunction has been employed for a variety of purposes in OT phonology
and syntax, see Smolensky (1993, 1997); also, e.g., Kirchner (1996), Aissen (1999),
Legendre (2000), and Lubowicz (2002).
8. We are extremely grateful to Matt Goldrick for his assistance with these counts.
References
Aissen, J. (1999). Markedness and subject choice in Optimality Theory. NLLT 17: 4.
673–771.
Anderson, J. (1987). The Markedness Differential Hypothesis and syllable structure
difficulty. In G. Ioupo and S. Weinberger (eds.) Interlanguage Phonology: the
Acquisition of a Second Language Sound System. Cambridge: Newbury House.
Anttila, A. (1998). Deriving variation from grammar. In F. Hinskens, R. van Hout, and
W. L. Wetzel (eds.) Variation, Change, and Phonological Theory. Amsterdam: John
Benjamins. 35–68.
Anttila, A. and V. Fong (2000). The partitive constraint in Optimality Theory.
[ROA 416, http://roa.rutgers.edu]
Anttila, A. and Y. C. Young-mee (to appear). Variation and change in Optimality Theory.
Lingua.
Archangeli, D. and D. Pulleyblank (1994). Grounded Phonology. Cambridge, Mass.:
MIT Press.
Implications and explorations of Richness of the Base 365
Levelt, C. and R. van de Vijver (this volume). Syllable types in cross-linguistic and
developmental grammars.
Lubowicz, A. (2002). Derived environment effects in Optimality Theory. Lingua 112.
243–280.
Mattys, S. and P. W. Jusczyk (in press). Phonotactic cues for segmentation of fluent
speech by infants. Cognition 78. 91–121.
Nagy, N. and B. Reynolds (1997). Optimality Theory and variable word-final deletion
in Faeter. Language Variation and Change 9. 37–55.
Pater, J. 1997. Metrical parameter missetting in second language aquisition. In S.
J. Hannahs and M. Young-Scholten (eds.) Focus on Phonological Acquisition.
Amsterdam: John Benjamins. 235–261.
Pater, J. and J. Paradis (1996). Truncation without templates in child phonology. Proceed-
ings of the Boston University Conference on Language Development. Somerville,
Mass.: Cascadilla Press.
Prince, A. and P. Smolensky (1993). Optimality Theory: constraint interaction in genera-
tive grammar. Technical Report CU-CS-696–93, Department of Computer Science,
University of Colorado at Boulder, and Technical Report TR-2, Rutgers Center for
Cognitive Science, Rutgers University, New Brunswick, N.J. To appear, Blackwell.
Prince, A. and B. Tesar (this volume). Learning phonotactic distributions.
[ROA 353, http://roa.rutgers.edu]
Reynolds, W. (1994). Variation and Phonological Theory. Ph.D dissertation, University
of Pennsylvania.
Saciuk, B. (1969). The stratal division of the lexicon. Papers in Linguistics 1. 464–532.
Santelmann, L. M. and P. W. Jusczyk (1998). Sensitivity to discontinuous dependencies
in language learners: evidence for limitations in processing space. Cognition 69.
105–134.
Shady, M. E. (1996). Infants’ Sensitivity to Function Morphemes. Ph.D. dissertation,
State University of New York at Buffalo.
Smolensky, P. (1993). Harmony, markedness, and phonological activity. Handout of
keynote address, Rutgers Optimality Workshop-1. [ROA 87, http://roa.rutgers.edu]
Paper presented at the Rutgers Optimality Workshop-1, Rutgers University, New
Brunswick, N.J.
(1995). On the internal structure of Con, the constraint component of UG. Paper
presented at UCLA, Los Angeles, CA. [ROA 86, http://roa.rutgers.edu]
(1996a). The initial state and ‘Richness of the Base’ in Optimality Theory. Technical
Report JHU-CogSci-96-4, Cognitive Science Department, Johns Hopkins Univer-
sity, Baltimore, Md. [ROA 154, http://roa.rutgers.edu]
(1996b). On the comprehension/production dilemma in child language. LI 27.
720–731. [ROA 118, http://roa.rutgers.edu]
(1997). Constraint interaction in generative grammar II: Local Conjunction (or, Ran-
dom rules in Universal Grammar). Paper presented at the Hopkins Optimality
Theory Conference/University of Maryland Mayfest, Baltimore, Md.
Steriade, D. (1993). Closure, release, and nasal contours. In M. Huffman and
R. Krakow (eds.) Nasals, Nasalization, and the Velum. San Diego: Academic Press.
401–470.
(1997). Phonetics in phonology: the case of laryngeal neutralisation. MS., Linguistics
Department, UCLA, Los Angeles.
368 Lisa Davidson, Peter Jusczyk, and Paul Smolensky
1. Introduction
Children at the age of 3 are usually credited with almost full mastery of the
sound system of their mother tongue, and they spend the ‘stage of mastery’ be-
tween 3 and 7 becoming perfect speakers.1 This chapter reports on the results of
a range of experiments that aimed to establish whether at the beginning of the
latter stage, at the ages of 3 and 4, Dutch children have full competence (or not)
in the word stress system of their mother tongue. This research question is all
the more interesting because the Dutch word stress system is lexical, in the
sense that, just as in English, the major regularities are disturbed by morpho-
logical factors (i.e., the stress-sensitive vs. stress-neutral affixation distinction)
as well as by lexical exceptions. The first aim of the discussion is to show that
the outcome of the experiments indicates that children of this age indeed appear
‘to know’ this complex system to a considerable degree, and the 4-year-olds
‘know it better’ than the 3-year-olds. Second, the interpretation of the exper-
imental results motivates a new look at the adult system: a significant new
subpattern emerges, surprisingly so in a research area that has been more or
less empirically stable for approximately two decades. This seems an interest-
ing result in its own right, both from the point of view of general methodology
and the study of the language-specific grammar. Third, it is shown that the ‘re-
vised’ Dutch word stress system which follows from these new findings, when
put into an Optimality constraint-based account, has an interesting property:
the account of the new subpatterns requires the active presence in the gram-
mar of a No-Clash constraint, whose services are not immediately apparent
from the traditional data, whether regular or irregular. It is proposed that this
section of the chapter can be viewed as an example of Inkelas’s (1999: 178)
proposal ‘that exceptions play a more central role in, rather than function as a
footnote to, theoretical analysis’. Finally, recognising that currently there are
broadly two ways of dealing with exceptionality, to wit reversed constraint
ranking (McCarthy and Prince 1993) and prespecified underlying forms in-
putted to a single fixed ranking (Pater 1995, Inkelas, Orgun and Zoll 1997,
Inkelas 1999), it will be shown that the active role of No-Clash holds under
369
370 Wim Zonneveld and Dominique Nouveau
Let us consider some examples of each of these generalisations and, at the same
time, some irregular cases per generalisation.
Words ending in an open syllable have penultimate main stress. The nature
of that penultimate syllable is immaterial: it can be open or closed. Two classes
of exceptions occur: some words have antepenultimate stress, others have final
stress; the former outnumber the latter.
Words ending in a superheavy syllable have final main stress. These syllables
virtually always occur at the right edge of a word (for discussion see Trommelen
1983, Kager 1989, Zonneveld 1993). For this type of word exceptional stress
patterns exhibit penultimate or antepenultimate main stress.
These word-final superheavies aside, Dutch rhymes usually have just two po-
sitions: VX, where X can be the second half of a long vowel (the second
element of a diphthong), or a consonant. It has been argued, therefore, that the
superheavies are surface manifestations of configurations comprising a regular
bipositional VX syllable, followed by an onset introducing a defective syllable
whose nucleus/rhyme is empty (Zonneveld 1993). Under the assumption that
the interpretation of the empty rhyme is ‘minimal’ (i.e., ‘open syllable’), the
stress behaviour of superheavies reduces to the regular penultimate stress pat-
tern of words with final open syllables (stu-dén-tV, for instance). This analysis
will be adopted here. (The exceptions in (2b/c) clearly must have a different
explanation).
Finally, words ending in closed syllables have two regular main stress pat-
terns: third from the right if the penultimate syllable is open; second from the
right if the penultimate one is closed (or, naturally, if the word is bisyllabic).
Again, two classes of exceptions occur.
Still pretheoretically, these data can be rearranged so as to bring out, per word
class, a subdivision into regular cases (let us call them A), exceptional ones (B)
and very irregular – but existing – cases (C).
Support for this subdivision comes from a number of areas. A purely statis-
tical underpinning can be found in Trommelen (1991). Non-standard shifts
(e.g., vágina → vagı́na, mónstrans → monstráns) virtually consistently favour
the A-pattern. Fully accepted or common Dutch pronunciations favour an
A-pattern over the pattern of the language of origin: mérci (Fr. mercı́),
Brindı́si (It. Brı́ndisi), Alcála (Sp. Alcalá), Niagára (E. Niágara), Marlbóro
(E. Márlboro), Navratilóva (Cz. Návratilova), Crı́stobal (Sp. Cristóbal), Bólivar
(Sp. Bolı́var), Pótomac (E. Potómac), McÉnroe (E. M[á]cenroe), Manchéster
(E. Mánchester), as do pronunciations of semi-familiar foreign words: babúshka
(Russ. bábushka), mispronunciations when speaking a foreign language: to
commént (E. to cómment), and children’s pronunciations of oddly spelled
words: [ma rɑ thɔ n] (D. márathon [ maratɔ n]).5
In a parametric framework (Kager 1989, Trommelen and Zonneveld 1989,
1999) Dutch is described as the following type of language:
parameter setting
(un)bounded: bounded ( = binary feet)
foot type: trochee (as opposed to iamb)
direction: right-to-left (for foot formation and the
word rule)
quantity: quantity sensitive (closed syllables heavy)
extrametricality: yes (heavy syllables)6
(5) x x x x
. (x .) (x .) (x .) (x .) <x> . (x) <x>
a-re-na te-le-foo-n al-ma-nak Gi-bral-tar
Trochaic feet are assigned from right-to-left. Feet are assumed to be binary, so
stray syllables may occur at the left edge. Closed syllables are always heavy;
they are not allowed to be right branches of feet, and they (exclusively) are
Child word stress competence: an experimental approach 373
extrametrical at the right edge of the word. Underlyingly superheavies are a full
binary foot, as indicated above.
The irregular B- and C-cases are subject to their own respective exceptionality
devices. First consider the B-cases:
(6) x x x
(x .)<x> (x .) <x> (x .) (x)
Ca-na-da o-li-fant kla-ri-net
The Cánada pattern is obtained by providing the final (open) syllable with a
gridmark, which – being absent from regular forms – counts against it; then an-
tepenultimate stress is derived by regular extrametricality of that final syllable.
If ólifant is marked as exceptional with regard to its final syllable (which irreg-
ularly remains closed and superheavy), again antepenultimate stress automati-
cally follows by extrametricality. A main-stressed final -VC syllable (klarinét)
can be obtained by marking it irregularly as non-extrametrical.
Finally, consider the C-cases:
(7) x x x
(x .) (x) . (x) <x> (x) (x) <x>
cho-co-la Pro-me-theus mes-si-as
Dutch 3- and 4-year-olds have full competence in the Dutch word stress system
if the picture of (4) emerges from word stress experiments conducted on them.
This strategy was modelled on that first expressed in Hochberg’s (1986, 1988)
investigation into Spanish, which has a word stress system not dissimilar from
374 Wim Zonneveld and Dominique Nouveau
Dutch. Aiming to show that ‘the hypothesis that children learning Spanish as a
first language learn rules for assigning stress, as opposed to simply memorizing
stress on a word-by-word basis’ (1988: 683), she found that:
From a developmental perspective, the most striking conclusion is that children
did learn stress rules, although [. . .] doing so was neither necessary nor
straightforward. It seems that children’s propensity to hypothesize linguistic
rules is so strong as to tolerate a high degree of exceptionality. [Moreover],
this study has highlighted a fact of Spanish stress [. . .] that remains to be
accounted for under any current theory. [. . .] The fact that this disfavoring is
relative, not absolute (since some Spanish words of this type do exist) shows
that the notion of degrees of irregularity should be encompassed in future
accounts. (1988: 704–705)
In the study reported on here a very rich system has to be coped with, with
respect to syllable structure and stress types. It is therefore necessary to gather
data which reliably represent the wide range of strings present in that system in
order to get a thorough picture of the child’s knowledge. Two main approaches
usually compete in studies of language acquisition, naturalistic sampling and ex-
perimental settings, and the latter was judged more suitable to the present aims,
the former (patient longitudinal collecting by a caregiver in familiar situations)
giving no guarantee at all that all data of interest would occur. For instance,
words of the same category with different stress types might be either absent
from the data or disseminated over the whole period of study. The experiment
of this contribution focuses on the elicited production of Dutch meaningful and
nonsense words.7 A group of meaningful bi-, tri-, and quadrisyllabic words
was assembled of the types described in the previous section, which could be
assumed to be known by subjects of the pertinent age (words denoting ani-
mals, toys, and sweets, mainly). Testing out these existing words served as a
pilot study to a further test which used nonsense words (again, of the shapes
and sizes described in the previous section) because the most reliable way to
conduct our investigation implied a direct comparison between adult results,
based on adult (‘rule-based’) intuitions, and child results, similarly based on
child intuitions about exactly the same language material. In such a situation
nonsense words are preferable to existing ones because we do not want to run
the risk of the adult (and the child, for that matter) simply retrieving a fully
specified memorised item from his or her lexicon, bypassing the rule system to
be probed.
Thus, the experiment proceeded in two steps. First, 36 children (18 3-year-
olds and 18 4-year-olds) produced the set of meaningful words with the help
of a picture book. The aim was to diagnose whether the degree of regularity of
stress in known items (A > B > C in (4)) had some impact on the children’s
realisations. More precisely, the following hypothesis was assumed:
Child word stress competence: an experimental approach 375
The more irregular the test words, the more likely children are to make errors in producing
them. Independently, these errors are expected to lead to more regular forms.
Notice that all syllables in these words begin with an onset, this both to ac-
complish structural unity among the word categories and to avoid perceptual
confusion. The meaningful words act as fillers in the adult (reading) test, in the
expectation that errors will be completely absent from them. In the children’s
naming test (pictures of familiar objects which the children have to name) the
goal, on the other hand, was to establish the extent to which the children’s
productions conform to the adult model.
Twenty students (11 females, 9 males) of Utrecht University took part in the
adult reading test, which they performed individually in a quiet room. They
received no information about the goal of the test. Randomised test words
(including a number of meaningful filler words) were presented on separate
cards, and the testees were invited to read these words aloud. These utterances
were recorded; all recordings turned out to be of good quality and all test patterns
perfectly intelligible. The test’s results are depicted in table 1.
376 Wim Zonneveld and Dominique Nouveau
(9)
Table 1. Nonsense words test, adults
Importantly, the right edge ‘three syllable window’ restriction is never vio-
lated in those two cases in which the subjects had the opportunity (karabilo,
monitaron). The results for the words ending in open syllables can be taken to
confirm this syllable’s weak metrical status. The glaring exception is the pre-
ferred antepenultimate stress pattern of fénimo. This is probably the result of
a subpattern to the effect that comparatively many words, especially frequent
ones, with the vowel -i- in the penultimate syllable, have antepenultimate main
stress, according to the B-pattern (cf. ánimo ‘zest’, dóminee ‘vicar’, África; see
also Trommelen and Zonneveld 1989: 65–66). The predominant final pattern
for superheavies is confirmed, too. But clearly the most interesting mixture is
constituted by the group of words with final -VC syllables, in which in two
cases even the notion of P(rohibited) appears, i.e., a stress situation impossible
within the system (in these cases because these results violate Quantity Sen-
sitivity). A rearrangement of the data for this particular class is presented in
table 2.
Child word stress competence: an experimental approach 377
(10) Table 2. Nonsense words test, adults, -VC final words only
A B C P
-VC: jakot 9 9
bizak 14 6
talaktan 11 7 2
baralton 7 7 6
merotak 13 5 2
dapiton 11 8 1
monitaron 6 6 8
For all words involved the B-pattern is popular, although significantly never
more so than the A-pattern. For some individual endings it is notoriously dif-
ficult to state which of them is regular (cf. róbot vs. schavót ‘scaffold’, kájak
‘kayak’ vs. tabák ‘tobacco’, rótan ‘rattan cane’ vs. Japán, márathon vs. pelotón
‘pack (of athletes)’), and this shines through in the results. The greatest surprises
are in two separate situations: two trisyllabic words allow P-patterns, and the
(single) quadrisyllabic word has the C-pattern as its most popular. There are two
aspects to the former observation. First, it is odd that violations of Quantity Sen-
sitivity (QS) (tálaktan, báralton)10 are possible at all (and apparently function as
C-patterns). Models for this pattern are virtually completely lacking. On the one
hand, a geographical name like Kázakstan probably ends in a (stress-neutral)
affix, and bı́atlon and trı́atlon presumably are cases of contrastive stress in a
semantically related pair (just as ı́mport and éxport, which ought to have final
syllable stress). On the other hand, the stress pattern of foreign placenames such
as Dúbrovnik (Serbo-Croatian) and Gibraltár (Spanish) converges in Dutch on
prefinal syllable main stress: Dubróvnik, Gibráltar.11 Second, skipping a penul-
timate closed syllable (violating QS) seems to be possible (cf. table 1) just when
the final syllable is closed, too, not when it is open. In fact, penultimate stress on
a ‘closed-open’ final sequence is completely exceptionless in the test, a remark-
able fact given the possibility of irregular main stress on final open syllables.
The case of quadrisyllabic monitáron is strange in view of the overall system, but
may, generally speaking, be due to a preference for rhythmically balanced struc-
tures (two times two syllables, rather than one followed by three). So far this pat-
tern as such is a C-pattern in terms of (4), and exists in the language, albeit char-
acteristically in highly unfamiliar words: Melchizédek, Archimédes, oxymóron.
Thus, the net result of the nonsense words test with adult native speakers is
twofold. First, the picture in (4) as a summary of Dutch word stress patterns
is broadly speaking correct, but some of the details of (10) must be kept in
mind, too. In this sense, this corpus of adult test results will serve as a frame
378 Wim Zonneveld and Dominique Nouveau
of reference for the analysis of the children’s imitations. Second, the hypoth-
esis about the state of full competence in the Dutch word stress system for 3-
and 4-year-olds will have to be modified accordingly, from this stage onwards
pertaining to (4) and (10) combined.
no. is the sheer amount of expected output words, O is the output actually found,
C the number of fully correct words, E the ones with one or more mistakes in
them (simply O-minus-C).
(12) Table 3. Existing words test, children (age 3–4)
3yo 4yo
no. O C % E O C % E
Clearly 4yos are better as performers of the test than 3yos are, for each pair of
equivalent cells, but this is something different from claiming that they have
better word stress competence. If there is a meaningful component in this table,
it is that the corresponding figures of the O-columns become higher, while at
the same time those of the E-columns become lower. These two tendencies
combined are expressed in the two columns of percentages, of C/O. It must
be admitted, however, that the differences between these two latter columns,
although clearly present, are connected to extremely low figures of E.
Table 3 indicates that the children made 34 errors: 21 plus 13. Output was
counted as ‘erroneous’ when it contained a deviation from the adult model in
a manner potentially affecting the stress type of the word: this could be a pure
stress shift, a change in the weight of a syllable (open vs. closed), or a change in
the number of syllables in the word (more or less). Among the actual 34 errors
two types were found: rhyme adaptations (changes of weight), and truncations
of non-primary stressed syllables. No stress shifts were found. Most of the errors
turned out not to affect the stress type of the word. Thus, child output [róbo], a
rhyme adaptation, is of the same type (A) as the target word róbot. And [páma]
and [plaplý], both truncations, are of the same types (A and C, resp.) as the
target words pyjáma and paraplú. Just 4 errors, a very small number indeed,
implied a change of type, all of them from either B or C to A, as hoped for:
lexical items. The aims of the experiment were these. From a developmental per-
spective it investigated the extent to which these children’s output is identical to
that of adults (presented in (9)), and the extent to which the children differ among
themselves in age groups. The distribution of errors among stress types was to
be explored, just as the hypotheses that children will make more errors in irreg-
ular words than in regular ones, and that their errors tend towards regularisation
of stress.
Forty children took part in this second test, from two daycare centres and
two nursery schools in Nijmegen and Utrecht. There were 20 3yos (6 boys and
14 girls) and 20 4yos (8 boys and 12 girls).14 The tests were conducted under
circumstances equivalent to those of the ‘meaningful word test’. Since the chil-
dren’s attention had to be held during the whole experiment, they were invited
to join in simple games. For each game different materials were used (objects,
pictures of imaginary animals, puppets), but the elicitation procedure was al-
ways the same. Children were instructed that they: (i) would be presented with
unknown objects; (ii) had to listen carefully to the names of these objects; and
(iii) would have to give the names when asked for by the investigator. The session
started with a short practice phase with familiar objects to ensure that the child
understood the imitation task and was aware of the importance of waiting for the
investigator’s request before realising the imitation. In the first step of a game,
the interviewer mentioned the name of an object twice, using a word out of the
list of nonsense words. She was trained to pronounce the stimuli clearly without
exaggerations with regard to stress. Then, a mechanical direct imitation of the
stimulus by a child was avoided by means of a stereotyped question: ‘X, do you
know what the name of this object is?’ The child was expected to answer yes, the
interviewer would ask: ‘What is its name, then?’, and the child would say the
name. When for some reason deviations from this procedure occurred, new imi-
tations of the relevant items were elicited at the end of the session. The taped data
were analysed independently by three transcribers (one of whom was Nouveau),
using broad IPA notation. Transcriptions were compared, and only those on
which the transcribers agreed obtained a pass. Disagreement in the interpreta-
tion of the data seldom occurred, however, and could almost always be solved
by reconsidering the recordings.
The basic material of the test was that of (8), minus the bracketed items.
There were many more than just 14 test items, however, because the full list
contained per item as many variants as there were syllables. This gave the 39
items below in (14):
(14) A B C P
- bóla bolá
fenı́mo fénimo fenimó
kanákta kanaktá kánakta
karabı́lo karábilo karabiló kárabilo
Child word stress competence: an experimental approach 381
- jákot jakót
dápiton dapitón dapı́ton
taláktan talaktán tálaktan
monı́taron monitarón monitáron mónitaron
- bokáat bókaat
karimóon kárimoon karı́moon
karéi kárei
doliméi dólimei dolı́mei
kadónt kádont
falimónt fálimont falı́mont
Each age group consists of 20 subjects who each imitated the same 39 stimuli.
For either corpus there are a total of 780 imitations, involving 700 words with
authorised stress patterns, and 80 with prohibited stress. We shall conclude that
the children of the test have mastered the Dutch word stress system when (i) their
error rates conform to the markedness rating in (14) (A < B < C < P); and (ii) the
errors they make generally go from ‘bad’ to ‘less bad’. We shall use information
on the quality of imitations (identical vs. in some way erroneous) to measure
error rates in all data representing a certain stress type, and then in each item.
We shall also measure whether errors in individual items lead more frequently
to a regularisation (or not). First, these measurements will be done for each age
group separately; subsequently we shall proceed to a comparison of the results,
which will help us detect developmental tendencies in the acquisition of stress.
Table 4 gives a summary of the test results per stress type and per age group,
where the percentages are based on the number of errors. (Notice that this time
output was always obtained for any item for any child.)
(15) Table 4. Broad results nonsense words test for 2 age groups:
3yos and 4yos
Age
group Stress types
The sum total of errors for the 4yos is not much lower than that for the 3yos.
Yet the percentages express a virtually perfect relationship between the degree
of stress regularity of nonsense words and the rate of errors made during the
imitation task. The results reflect the markedness hierarchy (A < B < C <
P). Regular nonsense words (type A) were easier to imitate than their irregular
382 Wim Zonneveld and Dominique Nouveau
counterparts (B and C), while prohibited patterns (P) were more difficult than
all other types.
These scores may be taken as a first indication that children by the age of
3 already have internalised much of the stress system of their mother tongue,
since they appear to be sensitive to various degrees of regularity in the system.
As a general tendency, 4yos made fewer errors in imitations within each stress
type than the 3yos, with the exception of type C. Moreover, these scores provide
strong independent support for the subdivision of Dutch stress patterns in the
categories A, B, C, and P, as employed in this chapter.
Since these general scores support the prediction that the error rate depends
on the regularity of the stress patterns, table 4 will now be set off against the
whole range of data, separated out into the two age groups. After that, the results
will be combined and an attempt will be made to detect developmental patterns.
Stress Location
Stimulus Pre-ant. Antep. Penult Final
bola – (A) 35% (C)
jakot 15% (A) 30% (B)
bokaat 70% (B) 15% (A)
karei 40% (B) 10% (A)
kadont 30% (B) 10% (A)
fenimo 30% (B) 10% (A) 40% (C)
kanakta 75% (P) 10% (A) 65% (C)
dapiton 20% (A) 40% (C) 15% (B)
talaktan 65% (P) 30% (A) 70% (B)
karimoon 60% (B) 90% (C) 50% (A)
dolimei 15% (B) 45% (C) 45% (A)
falimont 40% (B) 30% (C) 35% (A)
karabilo 65% (P) 30% (B) 20% (A) 65% (C)
monitaron 50% (P) 35% (A) 45% (C) 55% (B)
As indicated in bold, there are 5 cases of deviation from the expected patterning.
Before we go into them, let us have a look at some general tendencies that stand
out in this table.
All results of the top half conform to the expectations; more generally, two
types of data do so: bisyllabic words, and words (of all sizes) ending in open
syllables. Interestingly, the latter type includes the case of fenimo, for which
these 3yos ‘prefer’ penultimate stress, where adults had antepenultimate stress
(cf. table 1 in (9)); apparently, the interfering role of -i- is not (yet) part of
these children’s grammar. In the bottom half, karimóon is a comparatively
Child word stress competence: an experimental approach 383
unproblematic pattern. Among the 5 more problematic cases, A is still the best
pattern in 2 words: taláktan and monı́taron. The deviation in the case of dapiton
is only slight, since type B scores 3 errors out of 20 imitations, whereas type A
scores 4. With respect to falimont, some uncertainty seems to reign as to the best
stress pattern: range 30 to 40 percent. Furthermore, it seems that by favouring
antepenultimate stress in dolimei (actually type B), children overgeneralise the
regular pattern of trisyllabic words with final closed syllables to those with a
final diphthong. The results for these latter two items could indicate that 3yos
do not yet master complex rhymes very well, such as superheavies ending in
clusters, and diphthongs.
It can also be observed, however, that even though P(rohibited) patterns
scored a very high error rate across-the-board (range 50 to 75 percent), which is
just as expected, this is not always the highest error rate for each individual word.
The dissenting examples are *mónitaron and *tálaktan, which were both easier
to imitate than their final-stressed counterparts (which are type B). Similarly,
both in non-conformist and completely conformist examples, these children
often have difficulty in imitating final stress: one way to look upon the monitaron
results is to say that the B-pattern is difficult, and kanakta and karabilo have an
error rate of 65 percent for their C-pattern.
Table 6 gives the results of the same test, administered to the 4yos.
Stress Location
Stimulus Pre-ant. Antep. Penult Final
bola - (A) 45% (C)
jakot - (A) 5% (B)
bokaat 40% (B) 25% (A)
karei 50% (B) 15% (A)
kadont 30% (B) 25% (A)
fenimo 10% (B) 25% (A) 30% (C)
kanakta 65% (P) 5% (A) 80% (C)
dapiton 5% (A) 45% (C) 10% (B)
talaktan 45% (P) 25% (A) 40% (B)
karimoon 40% (B) 80% (C) 30% (A)
dolimei 25% (B) 55% (C) 10% (A)
falimont 45% (B) 45% (C) 20% (A)
karabilo 70% (P) 30% (B) 5% (A) 50% (C)
monitaron 60% (P) 35% (A) 65% (C) 55% (B)
A 14 9 1 4
B 12 6 3 3
C 9 3 6
P 4 2 2
3yos
antep. pen. fin. 4yos adults
ka-nak-ta 75(P) 10(A) 65(C) 65 5 80 – 20 –
ma-rot-ko – 20 –
ta-lak-tan 65(P) 30(A) 70(B) 45 25 40 2 11 7
ba-ral-ton 6 7 7
Child word stress competence: an experimental approach 385
The figures indicate percentages of error rates for the children; for the adults
they provide the absolute number of times a given stress pattern was used in
the 20 subject adults pre-test (indicating the relative popularity of that pattern
among adults). For talaktan, the expected correlation between the two sets
of data (children vs. adults) is introduced at the age of 4, although tálaktan
and adult báralton (both P) remain unexpectedly popular. For kanakta the ex-
pected correlation exists at all ages, with a strong dislike for anything other than
penultimate stress at all stages. These results will be readdressed in the next
section.
The overall finding that children’s errors were more frequent in irregular
and prohibited patterns than in regular ones cannot, of course, be confirmed
without an investigation of error directionality. The question is: when children
make imitation errors, do these lead to a regularisation or an irregularisation
of stress? All arguments in favour of the markedness categorisation presented
above would be undermined if it were discovered that changes in the imitations
tend to result in a deterioration of stress status. Conversely, our results so far
will receive strong support if errors improve irregular and prohibited words,
and leave already regular words – more or less – intact. In (20) are examples of
the types of errors that were exhibited in our data:
An obvious way for the child to deal with an irregular stress pattern is by
stress shift: this will always lead to a change in stress status. Rhyme shape
adaptation affects syllable weight, and may or may not have repercussions for
stress type. In our data, most changes modifying word length are truncations,
although cases of syllable addition are found too. Both kinds of latter error may
affect the stress status of imitations. Truncated forms concern items with pre-
antepenultimate stress (e.g., kábilo for *kárabilo), antepenultimate stress (dáton
for dápiton), penultimate stress (nı́no for fénimo), and final stress (káat for
bokáat). This process affects non-final unstressed open syllables. That is: it typ-
ically occurs under the same circumstances as truncation in early child speech
(Echols and Newport 1992, Lohuis-Weber and Zonneveld 1996, Kehoe and
Stoel-Gammon 1997, Pater 1997), and under conditions very similar to vowel
reduction in the adult language (Kager 1989, Trommelen and Zonneveld 1989,
1999).
However, occasionally truncation may also affect final syllables, which are
usually immune to it. Such a modification could be observed in the four forms
in (21):
Child word stress competence: an experimental approach 387
The different character of this process is illustrated by the fact that it affects
both open and closed syllables. It always occurs under extreme circumstances,
when (P)rohibited stimuli (violations of the three-syllable-window restriction,
or of Quantity Sensitivity) yield an authorised (A or B) stress pattern.
Errors in imitations can be classified into 3 classes (better – same – worse)
using the P > C > B > A scale of stress types. Obviously, ‘better’ then stands for
an error which makes a test item more regular, ‘same’ refers to one that does
not affect the classification, and ‘worse’ characterises errors which make an
item more irregular. Below, both for the 3yos and the 4yos, tables are presented
which contain the percentages for effects of errors calculated out of the total
number of errors found in either corpus. In the majority of cases irregular
patterns are expected to become regular, and prohibited patterns to be changed
into authorised ones, and not vice versa.
the prespecifications need not be the same). Nouveau provides the former type
of analysis, but Van Oostendorp (1997), in a review of her work, gives the
latter. The aim of this section will be twofold: first, as pointed out in the in-
troduction, to present a case in which a well-known substantial metrical con-
straint (No-Clash) turns out to be active in the grammar, its active status
motivated specifically by exceptional forms whose relevance only transpired
through our experiments; and second, to propose restrictions on the extent to
which different constraint rankings may potentially coexist in a single gram-
mar. Similarly, the next section will have a twofold aim: first, to show that
the choice of exceptionality theory does not appear to affect the argument
concerning the unexpected role of No-Clash; and second, that the single
published ‘prespecified underlying form’ analysis of Dutch has a number of
drawbacks that make it difficult to accept it as a contender, at this stage of
development.
Case (iv), with initial stress followed by a lapse, shows that Align-Edge-
L dominates All-Ft-R. Case (vi), a precious example from the fringes of
the Dutch lexicon, shows that All-Ft-R is the proper generalisation for foot
distribution rather than its mirror image twin.17 Finally, Head-R dominates
Align-Edge-L in order to ensure prefinal stress in case (ii). (Thus, combined:
Non-Fin » Head-R » Align-Edge-L » All-Ft-R.)
390 Wim Zonneveld and Dominique Nouveau
Finally, the words ending in -VC syllables must be accounted for. Recall that
two regular patterns here are differentiated by the nature of the penultimate
syllable: main stress is on the penult when it is closed (Gibráltar), and on
the antepenult when the penult is open (álmanak). The first pattern is a direct
consequence of the current analysis:
Following Prince (1980), Kager (1989), and Nouveau (1994), Bin(arity) is pro-
posed to cover two types of feet: (i) bisyllabic ones, independently of whether
these are open (light) or closed (heavy); and (ii) single specimens of closed
(heavy) syllables. Interestingly *(d) is a near-perfect representation that vio-
lates neither Bin nor Parse-Syl: rejecting it can be ensured by a constraint
hierarchy assumption: Head » Parse-Syl. This is a first point where the two
constraint strands interlock. Notice that two different structures with penulti-
mate stress survive; for the time being (we shall reconsider this point immedi-
ately below), low-ranked All-Ft-R prefers (c) to (b).
Child word stress competence: an experimental approach 391
The álmanak pattern requires the addition of some notion of weight sensitiv-
ity: WSP (the Weight-to-Stress Principle) bans any candidate that has a closed
(H) syllable in a non-head position (i.e., it bans L-H and H-H feet, and unparsed
H), cf. (26):
(26)
(a) * ( al - ma ) ( nák ) Non-Fin (» Head-R) violated, WSP not violated
(b) * ( al) ( má - nak ) WSP violated (assuming WSP » Head-R)
(c) * ( al) ( má ) (nak) Bin violated (assuming Bin » Head-R)
(d) ☺ ( ál - ma ) ( nak ) Non-Fin, Bin, WSP not violated
Comparison of candidates (b) and (d) establishes WSP » Head-R, and com-
parison of (c) and (d) establishes Bin » Head-R (both in order not to derive
penultimate stress, simply by Non-Fin » Head-R). The former of these new
rankings (WSP » Head-R) affects (25). In (25), candidates (c) and (d) will now
be eliminated at a relatively early stage by WSP, and (b) will be optimal. When
*(d) is eliminated by early-ranked WSP, the ranking of Head-R » Parse-Syl
is no longer motivated by this candidate. But (25) clearly shows that Parse-
Syl should still be kept in check by WSP » Parse-Syl (in order for WSP to
eliminate *(d)).
This concludes the account of the A-patterns of Dutch word stress. The
grammar that takes care of them has the following form:
As indicated, Head-R » Parse-Syl could still be true: this will indeed turn
out to be the case below, for a novel reason.
Trochee can be assumed to be undominated. Moreover, at the top of the
list Head-R is preceded by all of Non-Fin, Bin, and WSP, but in fact
a more intricate relation holds between these latter three constraints. This is
shown by the case of róbot, a bisyllabic word with L-H structure (the final
syllable is extrametrical in the parametric analysis, and the word would have
antepenultimate stress if longer, in a manner of speaking). A word of this type
has three reasonable candidates, each violating its own constraint, of which one
is blatantly ungrammatical:
392 Wim Zonneveld and Dominique Nouveau
below Head-R. The differences between these two types of ranking reversal
seem to correspond well to their respective characterisations as B- and C-type
irregular forms.20
Third, the C-pattern for VC-final words, of which Celébes from (3) is an
example, receives the following analysis: the regular WSP » Head-R ranking
is reversed to Head-R » WSP.21
Finally, we follow Nouveau (1994) in dealing with Nı́colaas and other ex-
ceptional words among those ending in superheavy syllables (cf. (2b)) in the
following two-step way. First, the irregular promotion of Non-Fin-Ft implies
the selection of [(Nı́co)(la-sV)] over the candidate with main stress in the final
foot. Second, [Ni-(cóla)-sV)] is eliminated by requiring that defective syllables
must always be footed, but cannot be heads of feet. These proposals imply that
the C-pattern example Prométheus cannot be generated: it will be a P-pattern
word; this is not necessarily a bad result in view of its unique status (cf. (4) and
n. 4).22
The different ranking reversals that together constitute the Optimality analy-
sis of irregular Dutch word stress patterns in that framework are the following
(leftward arrow means ‘promoted’, rightward arrow means ‘demoted’):
This display clearly contains a far from arbitrary collection of ranking reversals.
Rather, a number of potentially interesting (and, we submit, possibly univer-
sal) restrictions on this otherwise powerful mechanism appear to apply. Two
notions are central to these restrictions: that of a ‘pivot’ in the system, and
that of the ‘minimal number of constraints’ allowed to create a distance be-
tween the regular system and irregular forms. First, the ‘designated pivot’ of
all ranking reversals is the Head-R constraint: all demotions occur relative
to it to the ranking position immediately after it, and likewise all promotions
occur relative to it to the ranking position immediately before it. No other re-
versals are required. This function of Head-R should not surprise us, because
in a sense this constraint may be called the ‘functionally central’ one of the
system, word stress being known to be an edge phenomenon since Trubetzkoy
(1929: ch. II: ii). Second, in all cases (but one) just a single constraint is pro-
or demoted to capture the irregular class. All B-irregularities involve a sin-
gle constraint of the Non-Fin family, all C-irregularities either involve the
single WSP constraint, or Non-Fin and Bin demoted as a pair. This latter
394 Wim Zonneveld and Dominique Nouveau
(30)
(a) Experimentally obtained X-H-X data.
3yos 4yos adults
antep. pen. fin.
ka-nak-ta 75(P) 10(A) 65(C) 65 5 80 - 20 -
ma-rot-ko - 20 -
ta-lak-tan 65(P) 30(A) 70(B) 45 25 40 2 11 7
(b) A B C P
a-gén-da - - [ká-nak-ta]
fri-kan-dó (so far C)23
gi-brál-tar bom-bar-dón [tá-lak-tan] (so far P)
Child word stress competence: an experimental approach 395
- (tálak)(tan) * ** (gı́bral)(tar)
(tá)(lak)(tan) * ** (gı́)(bral)(tar)
(tá)(laktan) * * ** (gı́)(braltar)
- ta-(láktan) * * gi-(bráltar)
☺ ta-(lák)(tan) * gi-(brál)(tar)
- ta-(lak)(tán) * gi-(bral)(tár)
(talak)(tán) * * (gibral)(tár)
As shown, it is WSP that makes the crucial decision, although Head-R could
have made it too (recall that WSP » Head-R is demanded by the álmanak
case in (26)). Exceptional final stress (talak-tán) can be produced by demoting
396 Wim Zonneveld and Dominique Nouveau
- (kának)(ta) * * ** (ágen)(da)
(kának)-ta * ** (ágen)-da
(ká)(nak)(ta) ** ** (á)(gen)(da)
(ká)(nak)-ta * ** (á)(gen)-da
(ká)(nakta) * ** (á)(genda)
- (ka)(nákta) * * (a)(génda)
☺ ka-(nákta) * a-(génda)
☺ ka-(nák)-ta * a-(gén)-da
(ka)(nák)-ta * * (a)(gén)-da
(ka)(nák)(ta) ** * (a)(gén)(da)
- (kanak)(tá) * * * (agen)(dá)
ka-(nak)(tá) * * a-(gen)(dá)
(ka)(nak)(tá) ** * (a)(gen)(da)
Again WSP makes the crucial decision, although Head-R could have made it
too. Interestingly, even if Non-Fin-Ft is introduced into this tableau assuming
ranking reversal, penultimate stress will prevail:
Child word stress competence: an experimental approach 397
(ágen)-da * **
a-(génda) * *
☺ a-(gén)-da *
If antepenultimate stress is excluded for this case, the next question is: can
final-stressed kanaktá (= fricandéau=agendá) be blocked too? Unfortunately,
the answer is: not really. Irregular final main stress on an L-syllable is accounted
for by simultaneous demotion of Bin and Non-Fin, and (33) makes clear that
under these circumstances there is a form (the bottom one) that violates none of
the constraints WSP, Parse-Syl and Head-R. This form will be the winner.
This threatens to be a second unsatisfactory result, but before we accept this,
consider the following.
In itself, final main stress on an L-syllable is available as a C-pattern for cases
such as chocolá; it is just tanaktá ( = agendá) that we are trying to block. Is
there a constraint that we can invoke that has the following two properties: it
distinguishes between these two cases, and it trivialises evaluations by WSP,
Parse-Syl and Head-R? In fact, the difference in syllable shape between
the two cases suggests what is going on. A constraint will be invoked which so
far has played no role whatsoever in the analysis, and which interestingly will
be rankable in absolute topmost position: it will be unviolated. The constraint
is that in (35):
- (tálak)(tan) * ** (gı́bral)(tar)
(tá)(lak)(tan) * * ** (gı́)(bral)(tar)
(tá)(laktan) * * * ** (gı́)(braltar)
☺- ta-(láktan) * * * gi-(bráltar)
ta-(lák)(tan) * * * gi-(brál)(tar)
- ta-(lak)(tán) * * * gi-(bral)(tár)
(talak)(tán) * * (gibral)(tár)
The optimal candidate now is ta-(láktan), with penultimate stress as the result of
a final trochaic foot: ta-(lák)(tan) contains a clash, and is eliminated. This new
set-up reimplies that Parse-Syl must be deactivated by Head-R » Parse-
Syl, as hinted at earlier.
The two irregular patterns are derived as follows. First, when Non-Fin is
demoted to immediately below Head-R, final stress follows (bottom candi-
date of (36)). Second, precisely because ta-(lák)(tan) is eliminated by No-
Clash, irregular antepenultimate stress in tálaktan can be accounted for
by promoting Non-Fin-Ft, to immediately above Head-R (just as in the
case of Cána-da). This can be seen by considering the three candidates that
survive No-Clash in (36): introducing Non-Fin-Ft » Head-R into that
tableau makes [(tálak)(tan)] the optimal output. This suggests that a reason-
able new version of the Dutch pro- and demotion schema of (29) is that
below:
Child word stress competence: an experimental approach 399
(37) A B1 B2 C
-VV: = (27) Non-Fin-Ft ⇐Non-Fin, Bin ⇒
-VC = (27) Non-Fin ⇒ Non-Fin-Ft ⇐WSP ⇒
-VXC: = (27) Non-Fin-Ft ⇐n.a.
all pro- and demotions relative to Head-R
Demotion of Non-Fin is vacuous in the remaining two B1 cases, given foot
binarity (which excludes main stress on a final open syllable). Non-Fin-Ft
consistently takes care of all cases of irregular antepenultimate stress. Demotion
of constraints other than Non-Fin or Non-Fin-Ft is involved in irregularity
at the fringes of the system.
The introduction of No-Clash has an interesting and welcome conse-
quence for at least one further aspect of Dutch word stress. In the area of
secondary stress, Dutch much more so than English (Kager 1989) allows initial
trochees of the L-H foot type, preferring [(à-lek)(sán-dra)] to *[a-(lèk)(sán-
dra)] (similarly: ànecdóte, Yàwelmáni, Ràwalpı́ndi). Without No-Clash, the
ungrammatical form would be the winner, because its rival contains a WSP
violation (both obey Bin and Non-Fin); with No-Clash the correct candi-
date is selected, assuming No-Clash » WSP (which was assumed all along).
One can think of two other ways of getting this result: by Parse-Syl (as in
Pater 1997: 212) or by Align-Edge-L. But these attempts fail, because both
constraints are known to be ranked below WSP.
In dominating WSP, No-Clash also enforces a stresswise less crucial
choice between the two representations (róbot) and (ró)(bot), in favour of the
former. The L-H foot type implied in these examples is neither frequent nor
uncontroversial. It has been argued to exist, however, as part of the ‘Germanic
foot’ inventory of Old English by Dresher and Lahiri (1991), of the ‘secondary
stress foot inventory’ of modern English by Pater (1997), and of the foot in-
ventory of Finnish by Hanson and Kiparsky (1996). It was employed by Prince
and Smolensky (1993: 50–53) in their analysis of Latin word stress to which
the current analysis of Dutch is partly similar.26
This concludes the analysis of the incorporation of the test results of this
chapter into an Optimality account of Dutch word stress.
Taking as his point of departure the A-pattern grammar of (27) (or one equiv-
alent to it for present purposes), Van Oostendorp’s alternative proposal for B-
pattern exceptionality in words ending in open syllables (cánada) takes the
following form: assume that the irregular foot structure of this type of example
((cána)da) is contained in the underlying form; then add to the grammar a con-
straint demanding no deviations between underlying and surface foot structure:
Faith-Foot.27 As shown by Pater (1995) and Inkelas (1999), such faith-
fulness constraints are an integral part of this approach. C-pattern exceptions
(chocolá) are treated differently. Bin is circumvented by a segmental mod-
ification of the input, to the effect that these words underlyingly end in an
abstract consonant, which acts as a trigger for the analysis ‘normally’ reserved
for superheavy syllables: (choco)(lá-cv).
No exceptional VC-final patterns are discussed in the Van Oostendorp pa-
per, but, given what it does say, their properties may be presumed to run like
this. The analysis of B-type klarinét follows from underlying (klari)(nét-cv),
just as chocolá. For C-type Celébes we want to prespecify the apparent sur-
face foot structure in the input (/(Ce-(lébes)/), and then demand input output
foot faithfulness. The irregular patterns for the class of words ending in final
superheavy syllables are also not discussed by Van Oostendorp, and it will be
deemed unfruitful to speculate on these cases here.
Nouveau’s (two) experimental results are incorporated as follows. First, the
penultimate-stressed form a-(génda) will always be the winner (as it should) as
long as the A-type analysis (i.e., tableau (33)) is allowed to run its full course.
This means that Faith-Foot (underlying /(agen)da/ implies the incorrect
possibility of antepenultimate stress) will have to be ranked below WSP, which
makes the crucial choice. Second, the B-output (talak)(tán) follows, presum-
ably, from an abstract final -C. Crucially, for (tálak)(tan) even here No-Clash
will have to be posited: if Faith-Foot is ranked below WSP (suppose this to
be the case in (36a)), (tálak)(tan) will follow as (irregularly) optimal from an
underlyingly specified foot only if ta-(lák)(tan) is eliminated as a candidate by
No-Clash. It was exactly this point that was anticipated in the introduction
to this chapter when we said that ‘the active role of No-Clash holds under
both approaches’.
Accepting this result, we cannot at the same time avoid pointing out what
we perceive to be drawbacks of this analysis, precisely involving those two
points that serve as the analytical execution of the alternative: the Faith-
Foot constraint, and the abstract consonants.
Foot faithfulness poses the following problem. It has been pointed out in the
second half of the previous section that words of the type agenda completely
lack either possibility of irregular stress: No-Clash and/or WSP eliminate
them, and just agénda survives. In order to eliminate the possibility of *ágenda
in the prespecified underlying form framework, the following constraint ranking
Child word stress competence: an experimental approach 401
7. Conclusions
With regard to the goals of this chapter, it has been shown, first, that the ease
with which Dutch children of 3 and 4 years of age imitate carefully selected test
stimuli follows the hierarchy of regularity for word stress patterns predicted by
the metrical theory of Dutch word stress. Children find words with irregular
and prohibited stress more difficult to imitate than words with regular stress;
and children tend to regularise irregular and prohibited patterns, and to preserve
regular stress. The fact that, as observed, 3-year-old children have already mas-
tered most of the stress system suggests that the acquisition of the Dutch stress
patterns is precocious. But the system also improves with age, and does so in a
systematic way: the 4-year-olds have a stress system which is more similar to
the adult model. They have mastered the markedness hierarchy in more words
402 Wim Zonneveld and Dominique Nouveau
than younger subjects, and they make fewer errors in imitating most authorised
stress types. Second, given subtle new data obtained both for children and adults
in our tests regarding the Quantity Sensitivity of the stress system, it was shown
that Optimality Theory has the flexibility that the parametric approach lacks
in dealing with these data, in both current frameworks of exceptionality: the
ranking reversal analysis and the prespecified underlying representation anal-
ysis. It was pointed out that in both views the new, experimentally obtained,
Quantity Sensitivity data argue for the active presence in the grammar of the
No-Clash constraint, undominatedly. This constraint turned out to have been
part of the analysis all along, although it made its presence felt only in deep
probes of the Dutch word stress system. On the one hand, a number of restric-
tions were proposed in order to limit the possibilities of the ranking reversal
approach. On the other, it was also pointed out that the only existing attempt at
a ‘prespecified underlying representation’ analysis for the data of this chapter
fails to be a contender because it appears to be flawed in a number of ways.
Finally, observe that the empirically supported relevance of undominated
No-Clash, both to the child data as well as to the adult data, is interesting
from another acquisitional point of view. Given the current general assumption
that phonological acquisition is essentially the empirically driven modification
of a starting-point at which all substantive constraints precede all faithfulness
constraints (Gnanadesikan 1995/this volume, Tesar and Smolensky 1998), there
is nothing odd in the view essentially proposed here that No-Clash serves as
an unviolated constraint supervising all acquisitional stages up until adulthood.
This chapter therefore can be said to illustrate well the point made by Inkelas
(1999) cited at the end of the introduction.
notes
* This contribution is a revised version of chapters 4 and 5 of Nouveau (1994). We are
grateful to Jan Don, René Kager, and Joe Pater for comments on an earlier version
of this chapter, and to Marc van Oostendorp (1997) for his elaborate comments on
Nouveau (1994).
1. See, for instance, Berko Gleason (1997: 106–7): ‘By three, most children can produce
all the vowel sounds and nearly all the consonant sounds. This does not mean that
their productions are 100 percent accurate, but rather that the sounds are produced
correctly in at least a few words. Consonants that are likely to be in error [in English],
even at the age of four or five, are the liquids [and both th’s]. In most cases, correct
production of all sounds is achieved by around seven years of age.’ It has to be
taken into account, of course, that for any given child perception will be ahead of
production.
2. At Level 1, the right edge trisyllabic window is violated only under very special
circumstances. The suffix -ief , for instance, usually carries main stress (normatı́ef
‘normative’, sensitı́ef ‘sensitive’), but names of grammatical cases, which typically
Child word stress competence: an experimental approach 403
contain this suffix, have initial stress in disregard of the right edge window: génitief ,
nóminatief , áccusatief , etc. There are a number of further classes of this type.
3. Occasional subregularities interfere, not surprisingly, with the broad picture. Final
main stress is, for instance, the dominant pattern among words ending in front round
vowels, which are mostly loans from French: menú, continú, parvenú, individú,
miliéu, gonorróea [- rø] (vs. jiujı́tsu, Málmö).
4. Curiously in Dutch originally Greek names ending in -eus are pronounced with
(unstressed) [-œys]: Zeus [zœys], Néreus [ nerœys]. Prométheus [pro metœys] is
exactly the only case of its type: prefinal stress when the final syllable is superheavy,
in a word of three or more syllables.
5. For further examples, see Van Marle (1980) and Kager (1989).
6. Thus, Dutch differs from English in two respects: (i) just heavy syllables are extra-
metrical rather than all syllables; (ii) the light/heavy distinction is defined as open
vs. closed, Dutch lacking the distinction between long and short vowels in open
syllables; for further discussion, see Kager (1989), Zonneveld (1993).
7. See Hochberg (1988: 690–693) for a description of the similar design of her study
of Spanish stress.
8. So far, words ending in a diphthong ([i] in these two cases) were not discussed
here. Their stress behaviour closely resembles that of superheavy syllables (averı́j,
karwéi ‘job’), so they are added in this subsection of the test items.
9. Since these items were read out, they invited irregular but existing spel-
ling↔pronunciation correspondences. In this case [ja ko] was found twice, pre-
sumably modelled on depot [de po] and Peugeot [pø zjo].
10. Nouveau (1994: 102–103) attaches significance to the observation that the violation
is more frequent when the medial syllable ends in a sonorant. It may well be the case,
however, that the latter’s higher number is caused by the influence of the similar
(and regular) existing word báriton ‘baritone’, implying that the P-figure for the
sonorant example is excessively high (rather than that of the internal obstruent case
surprisingly low).
11. Curiously, the loan Colargol, the name of a popular (originally French) cartoon bear
à la Pooh, has either final (as in French) or initial main stress; there is no doubt, on
the other hand, that Bibendum, the original French name of the ‘Michelin man’, is
pronounced with prefinal stress.
Data are slightly more numerous when the initial syllable is closed. Hermandad
‘the Police’ (prefinal stress in Spanish) can have main stress on each of its syllables.
Spanish family names such as Fernandez have either prefinal (as in Spanish) or final
stress. Foreign geographical names such as Léxington, Dárlington, and Hélsingfors
may be assumed to have a morphological structure that blocks the application of the
(Level 1) stress rules to the full word. The word badminton (the game, few speakers
of Dutch are aware that it is the name of a town, too) can have main stress on the
initial syllable (as in English), but also has a regularised variant with prefinal stress.
Such VC-final examples are just about as (in)frequent as ones ending in open
syllables, such as Hélsinki/Helsı́nki and chı́mpansee/chimpansée (Timbouktu has
prefinal stress, Katmandu has prefinal or final stress); or ones ending in superheavies,
such as Ístanboel (main stress on the other syllables occurs too).
12. A detailed account of the growth of prosodic competence in Dutch children between
the ages of 1;6 and 2;6 can be found in Fikkert (1994, 1998).
404 Wim Zonneveld and Dominique Nouveau
13. Chances are that [kadót] is a regular item in the speech of the children involved, since
it is also known as a relatively common child speech misanalysis of the ‘stem’ of
the frequent diminutive kadóo-tje ‘(little) present’ (as kadóot-je, which is a formal
possibility, cf. bóot-je ‘little boat’). Given what was said above, páprika may be
an A-type example of a subpattern for words with prefinal -i-, and therefore not a
genuine case of improvement.
14. Although investigating further age groups (2- and 5- or 6-year-olds, for instance)
might have provided additional insights, there were two reasons for not executing
such experiments: the limited timespan available to the Ph.D. trainee (Nouveau),
and the outcome of an informal pilot test that the required task was completely
unsuitable for 2-year-olds.
15. For a similar way of viewing the difference between Parameter Theory and OT, see
Itô and Mester (1996: 185); for a criticism of their approach to ‘lexical strata’, see
Inkelas et al. (1997: 403–404).
16. In order to save space we use in this discussion representations that linearly indicate
syllable structure, footing, and accentuation; grid configurations of the kind used in
section 2 are implied throughout, however.
17. In the vocabularies of many (non-Germanic) languages words occur that are com-
pletely meaningless to the Dutch ear and eye, which can serve as nonsense test
items in order to confirm or disconfirm the intuition leading to this foot organisa-
tion. Words such as Shravanabelagola (town in India) and Catapathabramana (Old
Indic manuscript) appear to share the structure indicated.
18. Her model does allow co-phonologies, but just when systematically associated with
(consecutive) ‘levels’ of the lexicon, not within a single level. Pater (1995), in an
unpublished study of secondary stress in English, has (i) prespecified underlying
forms for truly exceptional forms, and (ii) a device he calls ‘lexically specific rank-
ing’ for ‘lexically based variation’. In the latter case, the following situation obtains:
C1sp » C2 » C1gen, where the two versions of C1 apply in lexically specified cases
and in the general case, respectively. This looks like ‘reversed constraint ranking’,
but an investigation into the similarities and differences beween the two approaches
has not been carried out in the context of our research on Dutch stress.
19. There still is a task for Bin, because *Ca(ná)da, i.e., simply a variant of the regular
form satisfying Non-Fin-F, must be excluded (henceforth bracketing in a tableau
indicates the exceptionality of the constraint’s hierarchical position).
20. An alternative would be to reverse the Trochee » Iamb hierarchy implicit in the
analysis, leading to a violation of the standard foot type of the language for at least
some irregular forms: cho-(colá). To have more than one foot type in a language
is usually considered highly undesirable (Inkelas 1999: 144, 150), but cases are
Child word stress competence: an experimental approach 405
known to exist, Yidin y being the standard reference (Hayes 1985: 442, 1995: 260–
262). Even in this latter case, however, individual words must show labeling
harmony: ‘if at least one foot of a word constitutes a canonical iamb, then all the
feet of the word are made iambic; otherwise all feet are made trochaic’ (Hayes 1995:
260). Dutch is not like that, as shown by four-syllabic examples such as càvaları́e (in
(1)), ı̀ndividú, and so on. Below we shall formulate another reason why the iambic
analysis of these cases goes against the grain of the language.
21. Irregular Celébes, cf. regular álmanak in (26), esp. (26b).
(Céle)(bes) **
☺ Ce-(lébes) * *
22. It also implies that bisyllabic exceptional words such as lı́chaam ‘body’ and árbeid
‘work’ from (2c) cannot be generated. The proposed solution for these cases lies in
the distinction between the ‘native’ and ‘non-native’ Dutch vocabulary, which can be
shown to exist on synchronic phonological and morphological grounds (Zonneveld
1993, Trommelen and Zonneveld 1992). The words in question belong to the ‘native’
lexicon, to which the defective syllable analysis fails to apply. If these words are
bisyllabic, non-final stress follows.
23. With hindsight, this move is statistically motivated, too: fricandéau is one of the
very few examples of X-H- L words. Two others are Katmandú and chimpansée, the
latter of which has an alternative pronunciation chı́mpansee, which is one of the very
few violations of Quantity Sensitivity in the language (see n. 11). The frequency of
the stressed nominalising suffix -ı́e after stems ending in consonant clusters (next
to strateg-ı́e one finds isomorf-ı́e, monarch-ı́e, industr-ı́e) invites a careful analysis
of this suffix in terms of the Level 1/Level 2 distinction. (This suffix also has an
irregular plural, violating the generalisation that Dutch nouns ending in vowels take
-s rather than -en: strategı́e-en, monarchı́e-en, and so on; Zonneveld 1999, ch. 9,
points out that historically this suffix was -ı́ë [-ı́jə ].)
24. The weight of the penultimate syllable is indeed to blame for this test result, and
not the popularity of final vowel main stress as such; recall the following additional
test results, which show that children make far fewer mistakes when the penultimate
syllable is open, and adults consider final main stress an option:
3YOs 4YOs adults
antep. pen. fin.
fe-ni-mo 30(B) 10(A) 40(C) 10 25 30 15 1 4
fa-gu-rie 1 9 10
25. In an unhappy lapse of accuracy, Van Oostendorp (1997: 140, tableau 5) crucially
misrepresents this case by failing to mention (tálak)(tan), giving – instead – (tálak)-
tan, as a candidate violating WSP, but also, completely trivially, Parse.
26. One might think head-ship (or not) of the final syllable to matter, for instance, for
vowel reduction. Among the intricacies of Dutch vowel reduction, however, is the
406 Wim Zonneveld and Dominique Nouveau
immunity of word-final syllables (of any type) to it. For discussion see Kager (1989)
and Trommelen and Zonneveld (1999).
27. At this point the purpose of this constraint will be clear; below, its ranking will be
considered more carefully.
28. An elaborate recent defence of the ‘abstract’ approach to English stress can be found
in Burzio (1994).
29. Zonneveld (1986) came close to this in proposing that these parenthesised elements
be considered stress-neutral affixes.
References
Benua, L. (1995). Identity effects in morphological truncation. In J. Beckman, L. W.
Dickey, and S. Urbanczyk (eds.) UMOP 18, Papers in Optimality Theory. Graduate
Linguistic Student Association (GLSA), University of Massachusetts, Amherst.
77–136.
Berko Gleason, J. (1997). The Development of Language. Boston: Allyn and Bacon, 4th
edn.
Burzio, L. (1994). Principles of English Stress. Cambridge: Cambridge University Press.
Cabré, T. and M. Kenstowicz (1995). Prosodic Trapping in Catalan. LI 26. 694–704.
Chomsky, N. and M. Halle (1968). The Sound Pattern of English. New York: Holt,
Rinehart, and Winston.
Dresher, B. E. and A. Lahiri (1991). The Germanic foot: metrical coherence in English.
LI 22. 251–286.
Echols, C. and E. Newport (1992). The role of stress and position in determining first
words. Language Acquisition 2. 189–220.
Fikkert, P. (1994). On the Acquisition of Prosodic Structure. Ph.D. dissertation, HIL,
Leiden University.
(1998). The acquisition of Dutch phonology. In S. Gillis and A. de Houwer (eds.) The
Acquisition of Dutch. Amsterdam: Benjamins. 163–222.
Gnanadesikan, A. (this volume). Markedness and faithfulness constraints in child
phonology.
Hable, M. and J. R. Vergnaud (1987). An Essay on Stress. Cambridge, Mass.: MIT Press.
Hanson, K. and P. Kiparsky (1996). A parametric theory of poetic meter. Lg 72. 287–
335.
Hayes, B. (1985). Iambic and trochaic rhythm in stress rules. In M. Niepokuj et al. (eds.)
BLS 11. 429–446.
(1995). Metrical Stress Theory: Principles and Case Studies. Chicago, Ill.: The Uni-
versity of Chicago Press.
Hochberg, J. D. (1986). The Acquisition of Word Stress Rules in Spanish. Ph.D. disser-
tation, Stanford University.
(1988). Learning Spanish stress. Lg 64. 683–706.
Inkelas, S. (1999). Exceptional stress-attracting suffixes in Turkish: representations vs.
grammar. In R. Kager, H. van der Hulst, and W. Zonneveld (eds.) The Prosody–
Morphology Interface. Cambridge: Cambridge University Press. 134–187.
Inkelas, S., O. Orgun, and C. Zoll (1997). The implications of lexical exceptions for
the nature of grammar. In I. Roca (ed.) Derivations and Constraints in Phonology.
Oxford: Clarendon Press. 393–418.
Child word stress competence: an experimental approach 407
Itô, J. and A. Mester (1996). The core-periphery structure of the lexicon and constraints
on reranking. In J. N. Beckman, L. Walsh Dickey, and S. Urbanczyk (eds.) UMOP
18, Papers in Optimality Theory. GLSA, University of Massachusetts, Amherst.
181–210.
Kager, R. (1989). A Metrical Theory of Stress and Destressing. Ph.D. dissertation,
Research Institute for Language and Speech, Utrecht University. (Published by
Foris, Dordrecht.)
Kager, R., E. Visch, and W. Zonneveld (1987). Nederlandse woordklemtoon (hoofd-
klemtoon, bijklemtoon, reductie, voeten). Glot 10. 197–226.
Kehoe, M. and C. Stoel-Gammon (1997). An investigation of current accounts of chil-
dren’s prosodic development. Lg 73. 113–144.
Kiparsky, P. (1968). How abstract is phonology? Distributed by Indiana University
Linguistics Club. Also in O. Fujimura (ed.) Three Dimensions of Linguistic Theory.
Tokyo: Tec 1973. 5–56, and in P. Kiparsky, Explanation in Phonology. Dordrecht:
Foris 1982. 119–164.
Liberman, M. Y. and A. S. Prince (1977). On stress and linguistic rhythm. LI 8. 249–336.
Lohuis-Weber, H. and W. Zonneveld (1996). Phonological acquisition and Dutch word
prosody. Language Acquisition 5. 245–284.
McCarthy, J. J. and A. S. Prince (1993). Generalized alignment. Yearbook of Morphology
1993. 79–154.
(1994). The emergence of the unmarked: optimality in prosodic morphology. In M.
Gonzalez (ed.) NELS 24. 333–379.
(1999). Faithfulness and identity in prosodic morphology. In R. Kager, H. van der
Hulst, and W. Zonneveld (eds.) The Prosody–Morphology Interface. Cambridge:
Cambridge University Press. 218–309.
Nouveau, D. (1994). Language Acquisition, Metrical Theory, and Optimality. Ph.D.
dissertation, Research Institute for Language and Speech, Utrecht University.
Oostendorp, M. van (1997). Lexicale variatie in optimaliteitstheorie (Besprekingsartikel
van Nouveau 1994). Nederlandse Taalkunde 2. 133–154.2
Pater, J. (1995). On the nonuniformity of weight-to-stress and stress preservation effects
in English. MS., McGill University.
(1997). Minimal violation and phonological development. Language Acquisition 6.
201–253.
Prince, A. (1980). A metrical theory for Estonian quantity. LI 11. 511–562.
Prince, A. and P. Smolensky (1993). Optimality Theory, constraint interaction in genera-
tive grammar. MS., Dept of Linguistics, Rutgers University, and Dept of Computer
Science, University of Colorado at Boulder.
Tesar, B. and P. Smolensky (1998). Learnability in optimality theory. LI 29. 229–268.
Tranel, B. (1994). French liaison and elision revisited: a unified account within optimality
theory. Read at the Linguistic Symposium on Romance Languages 24. University
of California at Los Angeles and University of Southern California.
Trommelen, M. (1983). The Syllable in Dutch: with Special Reference to Diminutive
Formation. Ph.D. dissertation, Utrecht University. (Published by Foris, Dordrecht.)
(1991). Dutch word stress assignment: extrametricality and feet. In T. F. Shannon and
J. P. Snapper (eds.) The Berkeley Conference on Dutch Linguistics 1989: Issues and
Controversies, Old and New. Lanham, Md.: University Press of America. 157–172.
Trommelen, M. and W. Zonneveld (1989). Klemtoon en Metrische Fonologie. Muider-
berg: Coutinho.
408 Wim Zonneveld and Dominique Nouveau
409
410 Index of subjects
Dual Lexicon Model 13, 57, 61, 64, 238–239 implicational universal(s) (see universals)
dummy (segment, syllable) 64, 84–86, 103, Independence Principle 247
387 infant articulation 61, 326
duplication problem 22–23, 46–47 infant speech perception 40–41, 159–162, 196,
221–226, 324, 328, 332–334
emergence hypothesis 39, 56 initial state 1, 3, 26, 31, 37, 40–44, 61, 73, 83,
Emergence of the Unmarked 26–28, 35, 74, 100, 139, 176–177, 204–205, 249,
79–81, 93, 100, 116, 230, 292, 297, 301, 263–265, 280, 292–293, 301, 322–326,
305, 309, 311 328, 341–342, 403
error-driven learning 194, 262 innateness hypothesis 38–39, 56, 58, 61, 73,
Evaluation (Eval) 18 80, 83, 116, 185, 323
input (adult – for the child) 6–7, 30, 33–34
factorial typology 28–30, 37, 207–208 intermediate grammars 205, 210–215, 258,
Faithfulness 288
faithfulness constraint 19–21, 40, 58,
76–77, 132, 230, 233, 249, 322 labial (role of feature – in early acquisition)
faithfulness to prosodic heads 133 87–89, 93–98
input-to-output faithfulness 76 learnability 1, 30–34, 38, 41–43, 45, 47,
output-to-output faithfulness 20, 43, 186, 185, 188, 232, 246, 292, 323–326,
188–189, 192, 236, 278 360–363
positional faithfulness 172, 271–273, learning algorithms 30–34, 167, 252,
277–278 311
final devoicing 6, 25–26, 43, 58, 64 batch vs. serial processing 195
final state 342, 357 Biased Constraint Demotion (BCD)
foot (in stress systems) 225, 304, 307–308, 192–194, 199, 259–270, 275–276
405 Constraint Demotion Algorithm (CDA)
fossilisation (see phonological idioms) 30–34, 42, 168–170, 175–176, 186, 195,
frequency (in input) 38, 61, 191, 204, 197, 232, 253, 254, 264
213–216, 222, 357–359 Gradual Learning Algorithm (GLA) 30,
fricatives (in acquisition) 2, 5, 7, 12, 14, 63, 232, 323
144–146, 349 Low Faithfulness Constraint Demotion
functional motivation 19, 20, 39 (LFCD) 177–185, 187, 191, 193–195,
fusion (of consonants) (see coalescence) 270–271
morphophonemic learning algorithm 187
Generative Phonology 3–4, 7, 21, 54, 56–59, Multi-Recursive Constraint Demotion
245 (MRCD) 258
Generator (G e n) 18, 75, 169, 194, 237 learning paths 207–209, 234
glides (in acquisition) 36, 77, 95, levels of representation 233
104–106 lexical exceptions 60, 369–370, 390
glottal segments (in acquisition) 104 lexical representation 4–7, 13–15, 24, 31, 33,
gradient well-formedness 185 46, 57–58, 67, 74, 84, 87–89, 102,
grammar 3, 4, 7, 17, 30, 38, 234 116–117, 127–131, 151, 158, 163,
Guttman scale 209, 211 167–168, 186, 217, 223–225, 229, 233,
235, 240, 254, 363
Habituation/Dishabituation Procedure lexical stratification 338
222–223 Lexicon Optimisation 24, 116–117, 168,
harmonic completeness 349 229
Headturn Preference Procedure 41, 226, 232, liquids (in acquisition) 77, 95, 104, 120–123,
328–329, 332 212
heavy (vs. light) syllables (see syllable weight) r-liquid 121–126
hidden strata (see covert strata) l-liquid 14, 105
loanword adaptation 25, 43–44, 292–295, 299,
identity mapping 24–25, 168–191, 250, 251, 301–304, 306–307, 309–310, 316, 322,
253, 262–263, 287, 326 337, 339–340
imitation task 381 Local Conjunction 210–211, 217, 350–353
Index of subjects 411
413
414 Index of names
Jakobson, Roman 1–3, 9, 11, 17, 20, 22, 38, Levelt, Clara 37–40, 42, 62, 64–66, 158, 205,
40, 45, 56, 58–60, 116, 128 209–210, 217, 240, 249, 253
Jespersen, Otto 102 Levelt, Willem 217, 323
Jones, Lawrence 56 Levin, Juliette 118, 119
Jongstra, Wenckje 153 Liberman, Mark 303, 398
Joppen-Hellwig, Sandra 144 Light, Timothy 93
Jun, Sun-Ah 162, 195, 359 Lindblom, Björn 61
Jusczyk, Peter 22, 40, 41, 44, 45, 47, 61, 161, Linker, Wendy 170
164, 185, 195, 196, 199, 217, 221, 225–226, Lleó, Conxita 140, 153
229, 232–233, 240, 293, 328, 329, 332, 334, Lloyd, Valerie 222
335 Lohuis-Weber, Heleen 150, 387
Lombardi, Linda 29, 102, 170, 271,
Kager, René 20, 38, 46, 118, 123, 151, 274
186, 195, 196, 206, 212, 217, 227, Lowenstamm, Jean 118
240, 316, 370–372, 387, 391, 400, Lubowicz, Anna 364
403, 404, 407 Luce, Paul 161
Kahn, Daniel 117, 146
Kari, James 92 Macken, Marlys 3, 46, 57, 59, 62–63, 102,
Katayama, Motoko 317, 338 116, 195
Kawakami, I. 310 MacNeilage, Peter 64–66
Kawasaki-Fukumori, Haruko 55 MacWhinney, Brian 45
Kaye, Jonathan 15, 16, 46, 118, 119, 151–153, Manzini, Rita 247
287 Maraist, Matthew 68
Kazazis, Kostas 188 Markey, Kevin 61
Kean, Mary Louise 8 Martohardjono, Gia 292
Keating, Patricia 160, 170–171, 195 Matthei, Edward 14–15, 58, 63, 64, 66, 220,
Kehoe, Margaret 40, 103, 387 224, 238
Kello, Christopher 61 Matthews, John 150, 221
Kemler-Nelson, Deborah 329 Mattys, Sven 328, 332
Kenstowicz, Michael 20, 46, 118, 186, 230, McCarthy, John 5, 18–20, 26–28, 35, 36, 43,
306, 316, 393 46, 66, 74, 76, 79, 84, 86–87, 90–92,
Kingston, John 240 102–106, 115, 117, 132, 133, 146, 188,
Kiparsky, Paul 3, 5, 11–12, 20, 57, 58, 67, 92, 227, 229–230, 236, 237, 240, 278, 287,
105, 158, 187, 245, 338, 400, 402 295, 310, 311, 313, 369, 389–391, 393,
Kirchner, Robert 47, 167–168, 210, 359, 402
364 McCawley, James 317
Kisseberth, Charles 7, 12, 23–24, 46, 58, 60, McHugh, Brian 197
165–166, 245, 253 McIntosh, B. J. 61
Kitagawa, Yoshihisa 304 Mendel, Gregor 54
Kitahara, Mafuyu 338 Menn, Lise 1–3, 5, 11–15, 17, 45, 56–61,
Kochetov, Alexei 117 63–67, 158, 196, 220, 224, 238
Kohler, Klaus 360 Menyuk, Paula 54, 55
Koutsoudas, Andreas 179, 273 Mester, Armin 47, 62, 122, 134, 210, 230, 238,
Krech, Holly 68 251, 263, 277, 280, 292, 296, 304, 307, 312,
Kuhl, Patricia 160–161, 191 314, 316, 338, 357, 405
Michelson, Karin 273, 306
LaCharité, Darlene 47, 294 Miller, J. D. 160
Lahiri, Aditi 400 Miller, Wick 118
Lalonde, Chris 161 Moreton, Elliott 239
Lamontagne, Greg 90–92, 104 Morrisette, Alanis 39
Langendoen, Terence 46, 59 Moskowitz, Arlene 63
Lebel, Éliane 121, 292
Legendre, Geraldine 339, 357, 364 Nagy, Naomi 339, 357
Leonard, Laurence 129 Nathan, Geoffrey 43
Leopold, Werner 63 Newman, Rochelle 360
416 Index of names
Todorova, Marina 339, 357 Vihman, Marilyn 40, 45, 57, 59–61, 67–69,
Tranel, Bernard 195 195, 224
Treiman, Rebecca 152 Visch, Ellis 370
Trommelen, Mieke 118, 119, 123,
151, 370–372, 377–379, 387, 406, Waals, Juliette 151
407 Walter, Henriette 295
Tropf, Herbert 342 Walther, Markus 194
Trubetzkoy, G. N. 394 Wang, Chilin 292, 341
Tsuchida, A. 309–310, 318 Waterson, Natalie 59
Turkel, William 47, 167–168 Werker, Janet 161, 164, 221–224, 232–233,
240
Urbanczyk, Susan 230 Werle, Adam 47
Wessels, Jeanine 161
Vainikka, Anne 339, 357 Westbury, John 170–171
Vance, Timothy 190 Wexler, Ken 247
Van de Vijver, Ruben 37–38, 40, 316, Whitney, William 77–78
323 Wiese, Richard 118, 121, 150, 151
Van de Weijer, Jeroen 151 Wijnen, Frank 225
Van de Weijer, Joost 213, 217 Wilson, Colin 195, 198, 336
Van der Hulst, Harry 58, 118, 119, 123, 151, Wilson, H. S. 103
212 Wright, Richard 117
Van der Torre, Erik-Jan 153
Van Heuven, Vincent 217 Yeni-Komshian, Grace 61
Van Marle, Jaap 404 Yip, Moira 47, 85–87, 93, 338
Van Oostendorp, Marc 212, 249, 370, 390,
400–403, 406 Zec, Draga 103
Vapnik, V. 248 Zoll, Cheryl 42, 197, 278, 369
Velleman, Shelley 40, 61, 68, 102, 224, Zonneveld, Wim 43–44, 118, 119, 123, 150,
225 153, 217, 240, 316, 370–372, 377–379, 387,
Velten, H. 10 391, 404, 406, 407
Vennemann, Theo 118 Zubritskaya, Ekaterina 339, 357
Vergnaud, Jean-Roger 118, 391 Zuraw, Kie 47, 186
Vigorito, James 221 Zwicky, Arnold 102