ExLing2016proceedings PDF
ExLing2016proceedings PDF
ExLing2016proceedings PDF
net/publication/309727768
ExLing 2016
CITATIONS READS
0 441
1 author:
Antonis Botinis
National and Kapodistrian University of Athens
51 PUBLICATIONS 343 CITATIONS
SEE PROFILE
Some of the authors of this publication are also working on these related projects:
ExLing 2017: 8th Tutorial and Research Workshop on Experimental Linguistics View project
ExLing 2018: 9th Tutorial and Research Workshop on Experimental Linguistics View project
All content following this page was uploaded by Antonis Botinis on 06 November 2016.
ExLing 2016
Proceedings of 7th Tutorial and Research Workshop on
Experimental Linguistics
This volume includes the proceedings of ExLing 2012, the 5th Tutorial and Research
Workshop on Experimental Linguistics, in Athens, Greece, 27-29 August 2012. The
first conference was organised in Athens, in 2006, under the auspices of ISCA and
the University of Athens and is regularly repeated thereafter, including the last one
in Paris, in 2011.
In accordance with the spirit of this ExLing 2012 conference, we were
once again gathered in Athens to continue our discussion on the directions of
linguistic research and the use of experimental methodologies in order to
gain theoretical and interdisciplinary knowledge. We are happy to see that
our initial attempt has gained ground and is becoming an established forum
of a new generation of linguists.
As in our previous conferences, our colleagues are coming from a variety
of different parts of the world and we wish them a rewarding exchange of
scientific achievements and expertise. This is indeed the core of the ExLing
events, which promote new ideas and methodologies in an international
context.
We would like to thank all participants for their contributions as well as
ISCA and the University of Athens. We also thank our colleagues from the
International Advisory Committee and our students from the University of
Athens for their assistance.
Antonis Botinis
Contents
Tutorial papers
Remanence of sentence prosody in Romance languages ............................................. 1
Philippe Martin
Rich Reduction: Sound-segment residuals and the encoding of communicative
functions along the hypo-hyper scale ........................................................................ 11
Oliver Niebuhr
Research papers
Visual search strategies and letter position encoding in Russian ............................. 25
Svetlana Alexeeva
Emergence of word prosody in (Seoul) Korean ......................................................... 29
Angeliki Athanasopoulou, Irene Vogel .............................................................. 29
Voice Activity Detector (VAD) based on long-term phonetic features ...................... 33
Andrey Barabanov, Daniil Kocharov, Sergey Salishev, Pavel Skrelin,
Mikhail Moiseev
The identification of two Algerian Arabic dialects by prosodic focus ....................... 37
Ismaël Benali
Intonation and polar questions in Greek revisited .................................................... 41
Antonis Botinis, Anthi Chaida, Olga Nikolaenkova, Elina Nirgianaki
The imprint of disposition in social interaction ......................................................... 45
Mark Campana
Intonation and polar questions in Greek ................................................................... 51
Anthi Chaida, Angeliki Sotiriou, Athina Kontostavlaki
Contextual predictions and syntactic analysis: the case of ambiguity resolution ..... 55
Daria Chernova, Veronika Prokopenya
Vocal fatigue in voice professionals: collecting data and acoustic analysis ............. 59
Karina Evgrafova, Vera Evdokimova, Pavel Skrelin, Tatiana Chukaeva
Creating a subcorpus of a heritage language on the example of Yiddish ................. 63
Valentina Fedchenko, Ilia Uchitel
Affricates in the spontaneous speech of Aromanians in Turia ................................... 67
Anastasia V. Kharlamova
L1 transfer, definiteness and specificity of determiners in L2 English ...................... 71
Sviatlana Karpava
Writing-based wordforms vs. spoken wordforms....................................................... 75
Vadim Kasevich, Iuliia Menshikova
ii Contents
Effect of saliency and L1-L2 similarity on the processing of English past tense
by French learners: an ERP study........................................................................... 139
Maud Pélissier, Jennifer Krzonowski, Emmanuel Ferragne
Phonostylistic study of Spanish-speaking politicians: Populist vs. Conservative ... 143
Carmen Patricia Pérez
Experimental L2 text production with WinPitch LTL .............................................. 147
Darya Sandryhaila-Groth
Exploring prosodic convergence in Italian game dialogue ..................................... 151
Michelina Savino, Loredana Lapertosa, Alessandro Caffò, Mario Refice
Syllable cueing and segmental overlap effects in tip-of-the-tongue resolution ....... 155
Nina Jeanette Sauer
An experimental study of English accent perception ............................................... 159
Elena Shamina
Phonetic words duration simulation using Deep Neural Networks ......................... 163
Alexander Shipilo
Transcription: what is meant by accuracy and objectivity? .................................... 167
Pavel Skrelin, Nina Volskaya
Grammatical change and hindcast model statistics – A comparison between
Medieval French and Brazilian Portuguese ............................................................ 171
Eduardo Correa Soares
The Phonetics of Russian North Bylinas.................................................................. 175
Svetlana Tananaiko, Marina Agafonova
Association experiment in practice of linguistic and cultural dominants
research ................................................................................................................... 179
Svetlana Takhtarova, Diana Sabirova
Filled pauses and lengthenings detection using machine learning techniques ....... 183
Vasilisa Verkhodanova, Vladimir Shapranov, Alexey Karpov
Psycholinguistic evidence for the composite group ................................................. 187
Irene Vogel, Angeliki Athanasopoulou
Remanence of sentence prosody in Romance
languages
Philippe Martin
LLF, UFR Linguistique, Université Paris Diderot Sorbonne Paris Cité
Abstract
Romance languages uses surprisingly similar melodic contours to encode the
sentence prosodic structure. The fact that these contours are governed by similar
prosodic grammars and that similar stress rules are also applicable to these
languages (except on French deprived of lexical stress) suggests that these
phonological facts are inherited from Latin without much change, despite the
constant evolution occurred during twenty centuries.
Key words: intonation, prosody, Romance languages, stress, prosodic grammar
Introduction
Sentence intonation is always present in the linguistic communication, even
in silent reading. We cannot process language, whether in oral or written
form, without decoding the prosodic structure intended by the speaker or
recover (or approximate) the intonation intended by the writer.
Indeed, due to memory limitations, it is not possible to retain long lists of
objects such as words or syntagms without structuring these lists by some
hierarchical grouping. Remembering large numbers or long lists of digits as
found in telephone or credit card numbers requires to structure this
information into small chunks, eventually organized into two or more levels,
in order to form a structure. In these specific cases, where digits lack of any
morphological information, only the prosody, organized into a prosodic
structure, that will give to the listener enough indications to restore the
intended structure of the data. In reading, this role is devoted to graphic
indicators such as blanks separating groups of digits or of words.
In speech communication, although many morphological or grammatical
tools are available to recover a structure from the sequence of syllables
pronounced by a speaker, it is again the prosodic structure which provides
the first and essential hints to decode the sentence structure.
Intonation in Romance
Investigate similarities of prosodic structures in Romance languages
involves three main topics:
Prosodic grammar
The object of prosodic research is to determine the phonological features of
the contours located on accent phrases stressed syllables, and to discover the
underlying grammar which implement the dependency relations between
contours. Another remarkable point pertains to the time linear properties
related to the processes of encoding and decoding the prosodic structure.
Considering that prosodic events instantiated by melodic contours occur
not simultaneously but one after the other on the time line instantiated by the
sequence of syllables, it can be shown (Martin, 2015) that it is necessary and
sufficient to evaluate dependency relation between two consecutive
contours, provided a ranking between phonological contours has been
established.
8 Ph. Martin
The ranking of prosodic contours in French is Cn < C2 < C1 < C0, and
presents an inverted ordering C1 < C2 for the other Romance languages:
Given these differences, the prosodic grammar operates the same way in
French and in the other Romance languages. By comparing two successive
melodic contours, say Cx and Cy, relative to their ranking, the listener is
able to assemble or not the prosodic words implied:
if Cx < Cy, the accent phrases attached to Cx and Cy are merged [Cx Cy]
else if Cx = Cy, the accent phrases attached to Cx and Cy are part of a list, to
be terminated by the occurrence of a contour of higher rank [Cx Cy …
else if Cx > Cy, the accent phrases attached to Cy is not merged with the one
attached to Cx [Cx [Cy…
Figure 1. Italian example of prosodic structure built by increments along time axis.
Figure 3. French example of prosodic structure built by increments along time axis.
10 Ph. Martin
Conclusion
No language is likely to escape the constrain of generating and decoding the
sentence prosodic structure. However, it may be more surprising that
Romance languages (except French) would use the same phonological
melodic contours and the same grammar of intonation to encode the prosodic
structure, leading to suggest that the melodic contours and the grammar that
describe their use are inherited from Latin, despite the large differences in
phonology, morphology and syntax existing among the languages derived
from Latin.
The remanence of phonological prosodic features among Romance
languages (including French when the absence of lexical stress is
considered) is remarkable and pertain to the following topics:
Stress placement in the accent phrase is clearly derived from the classical
Latin stress rules, with the addition of suffixes ad flexions classified as
stressable or unstressable. The same simple stress rule applies to all
Romance languages. Classes of melodic contours are phonologically similar,
with the exception of French which has no complex contour Cc since it has
no lexical stress. Finally, the principle of contrast of melodic slope also
applies to all Romance languages, French deprived of the complex contour
Cc using another ranking in the prosodic grammar Cn > C2 > C1 > C0,
instead of Cn < C1 < C2 < Cc < C0.
References
Alkire T., C. Rosen, 2010. Romance Languages, an Historical Introduction,
Cambridge University Press.
Delattre P. 1966. Les dix intonations de base du français, French Review 40, 1-14.
Garde P. 1968. L’accent, PUF, Paris, 172 p. / (2013) Lambert-Lucas, Paris.
Martin, Ph. 2015. The Structure of Spoken Language. Intonation in Romance,
Cambridge University Press.
Frota, S., P. Prieto (eds.), 2015 Intonation in Romance. Oxford University Press.
Rich Reduction: Sound-segment residuals and
the encoding of communicative functions along
the hypo-hyper scale
Oliver Niebuhr
Mads Clausen Institute, IRCA, University of Southern Denmark, Alsion, Denmark
Abstract
The H&H (Hypo-Hyper) Theory of Lindblom (1990) is probably one of most
prominent theories of the phonetic sciences. It was put forward at a time when
research on speech reduction started to undergo a shift in focus from the description
and linguistic embedding of phonological processes to questions about their
phonetic details, contextual factors, perception, and cognitive processing. This shift
in focus, in combination with the application of digital technologies and resources
have fundamentally changed our knowledge of speech reduction. The present
chapter will argue with reference to examples from different languages - and in
accord with Lindblom's own expectation - that his well-known "tug-of-war"
metaphor needs to be adapted in the light of these changes. The "tug-of-war"
metaphor conceptualizes the realized degree of reduction as a compromise between
economic and intelligible speech. However, first, growing perception evidence
questions the metaphor's key assumption that more articulatory economy and hence
a higher degree of reduction make speech less intelligible for listeners. Moreover, a
one-dimensional hypo-hyper continuum controlled by two antagonistic forces
(speaker and listener) ignores that fact that communicative functions are another
separate driving force for variation in the degree of reduction. Therefore, the author
suggests to abandon that the tug-of-war metaphor in favor of an adaptation
Bolinger's famous wave metaphor.
Introduction
Managing and, ideally, explaining phonetic variation has ever since been a
key issue in the speech sciences. But, it became even more obvious with the
beginning of the "acoustic age" after World War II, when the US military
declassified the invention of the sound spectrograph. It made speech a
precisely analyzable research object. The radiply developing computer
technology made this research object accessible to a growing community of
phoneticians (Mattingly 1999), which, in turn, multiplied the number of
questions on phonetic variation and their levels of detail and complexity.
Phonetic variation supported the development of phonetics and phonology as
two different disciplines and later expedited the "divorce" of those
disciplines, with phonology taking care of the well-formed structures of
clearly defined sound (or intonation) categories and their rule-based changes,
and with phonetics measuring the messy, highly variable articulatory and
theoretical framework that also takes into account the listener and his/her
cognitive abilities and processes.
The latter is exactly what was done by Lindblom (1990) in his very
influential H&H theory. "Explaining phonetic variation" (p.403) is the
explicit aim of Lindblom's theory. It compares speech communication to a
tug-of-war, with speaker and listener pulling the rope that represents
phonetic variation in opposite directions, see Figure 1. The speaker follows a
basic ethological principle of all mammals, i.e. striving for economy.
Accordingly, the speaker's aim is to minimize the articulatory effort invested
in speech production and hence reduce the speech signal as much as
possible. The extent to which this is possible is defined by the listener at the
other end of the rope: The speech signal has to contain at least enough
phonetic information to allow the listener understand the message conveyed
by the speaker. In other words, speakers want to produce "hypospeech", and
listeners want to hear "hyperspeech".
On this basis, the key concept of the H&H theory is that, at each point of
the conversation, the level of speech reduction is an implicitly negotiated
compromise along the hypo-hyper scale between speaker desires and listener
demands. A further key concept is that this dynamic, adaptive compromise
takes into account not only basic factors like speaker physiology (e.g.,
gender, emotions, pathologies) and the environmental acoustics of the
communication situation. The compromise is also made with respect to the
listener's metalinguistic top-down knowledge and context-driven expectation
about which units, functions, and meanings will be contained in the
upcoming speech signal. This allows the speaker to be less clear in or even
completely omit those acoustic cues which s/he knows that the listener can
14 O. Niebuhr
add in the process of speech perception. This idea was probably the H&H
theory's most important contribution to speech sciences. It replaces
invariance by sufficient contrast and hence goes beyond the common picture
of speech as a machine-like self-contained code that is encoded on the side
of the speaker and transmitted through the air with all elements that the
listener requires to decode it. In contrast, all that speakers need to do in
Lindblom's framework is, broadly speaking, to be sufficiently clear, feed
their listeners with a sufficient number of acoustic cues, and then let their
top-down processes do the rest, i.e. interpret the signal by matching it against
knowledge and expectations, and, if necessary, fill in gaps.
Many studies provide empirical support for the H&H theory. For
example, Hunnicutt (1985) concluded from the results of a combined
production-perception experiment that speakers hyperarticulate more if
words are less predictable in a given semantic (sentence-frame) context.
Fowler & Housum (1987) showed by means of radio news broadcasts that
repeatedly stated words are more hypoarticulated (i.e. reduced) by speakers.
Similarly, Wright (2003) found "easy" words, i.e. frequent words with
relatively few lexical competitors, to be more strongly hypoarticulated than
"hard" words. Finally, we know from a number of experiments that speech
produced under adverse conditions such as noise or greater spatial distances
between the dialogue partners is produced with more effort both
articulatorily and phonatorily (Traunmüller & Erickson 2000; Junqua 1996).
Despite this converging evidence in favor of H&H, we should not lose
sight of one crucial fact: Lindblom's framework never aimed at explaining
phonetic variation in general. Rather, the framework was developed to
explain that phonetic variation that is relevant to and emerges in connection
with "successful lexical access" (Lindblom 1990:405). However, we know at
least since the rise of intonational phonology (Ladd 2008) that speech
communication is not only about words. Lindblom himself notes that speech
is "produced not only in the laboratory but also in its natural, ecological
settings" (p.418), and he stresses in this context that the assumption of only
two antagonistic forces that create the one-dimensional reduction continuum
from hypo to hyper is a "deliberate simplification that is likely to be revised
in the course of future work" (p.419).
In fact, Lindblom's H&H theory was taken up and further elaborated, for
example, in terms of the smooth signal redundancy hypothesis of Aylett &
Turk (2004). In simple terms, the hypothesis states that the total degree of
reduction used by speakers is understandable as the sum of two types of
redundancy: language redundancy (e.g,, due to syntactic order or
grammatical agreement) and signal redundancy (e.g., several acoustic cues
on the same phonological distinction). Aylett & Turk assume that speakers
strive to keep the total redundancy constant, which means that a lower
Rich reduction 15
1
All references involving Niebuhr in the following sections 2 and 3 that are not
including in the list of references can be found in Cangemi et al. (in press).
2
Note that there are actually two different types of reduction: (1) the amount of
energy invested in articulation and phonation, and (2) deviations from full/ideal
citation forms of consonants, vowels, and words. The two types are equated here.
The author is aware of the fact that this is probably a simplification (Yi Xu, pers.
comm.), but one that does not affect the line of argument presented here.
16 O. Niebuhr
adding the notion of "phonetic essence", see Niebuhr & Kohler (2011) and
Kohler & Niebuhr (2011). Phonetic essence is a feature of complex sound
sequences like words, and the assumption is that, in speech reduction, those
sound characteristics of the sequences are maintained and reshaped as
articulatory prosodies that belong to the sequence's phonetic essence.
For instance, the German modal particle "eigentlich" (actually) is
characterized by palatality that pervades virtually the entire word: [aɪɡŋtlɪc].
An analysis of the Kiel Corpus of Spontaneous Speech (Peters 2005) showed
that "eigentlich" can be severely reduced, with only the initial diphthong
and, maybe, the middle nasal being left at the segmental level: [aɪȷ̃ (̃ ɲ̆)].
However, in these cases the palatality of the lost sound segments is
maintained by strengthening and lengthening the palatality in the initial
diphthong. That is, the closed-vowel element is produced longer and with a
higher F2 frequency. A perception experiment conducted by Niebuhr &
Kohler (2011) showed that listeners have no problems interpreting this
articulatory prosody of palatality and distinguishing highly reduced
"eigentlich" from the segmentally similar unreduced word "ein" (indef.
article). Likewise, the study Kohler & Niebuhr (2011) addressed the word
"ihnen" - [i:nʲɪnʲ] (to you) - whose separate segmental representation
completely disappeared in the sentence frame "ich kann ihnen das ja mal
sagen" (I can mention this to you) produced by speaker TIS in the Kiel
Corpus of Spontaneous Speech. Despite the loss of all segments, the
phonetic essence of palatality of "ihnen" was kept and superimposed by the
speaker on the segments of "kann" and "das" that, as a result, change from
[kʰa̠nna̠s] to [k̟ʰɛ̈nʲnʲə̟s]. Evidence from a perception experiment showed that
listeners can reliably perceive the entire word "ihnen" on this basis of
[k̟ʰɛ̈nʲnʲə̟s] in the sentence frame "Ich ___ ja mal sagen". Moreover, as the
sound segments of [k̟ʰɛ̈nʲnʲə̟s] were successively replaced by those of
[kʰa̠nna̠s], the perceived wording of the stimulus sentence changed to "ich
kann das ja mal sagen" (I can mention this), without "ihnen".
Further phonetic essences that are reshaped as articulatory prosodies and
whose perceptual relevance was been experimentally demonstrated are
velarization (Niebuhr 2008), glottalization (Kohler 1999), and lip rounding
(Niebuhr & John 2014). In all these examples, the articulatory prosodies
were reduced representatives of at least entire syllables, and in the case of
Niebuhr (2008) the velarization even represented two full words, i.e. "auch
noch" (as well).
Articulatory prosodies almost always co-occur with duration cues in the
form of a compensatory lengthening of segments in the vicinity of
disappeared segments. However, while articulatory prosodies are sufficient
to make listeners perceive segmentally disappeared syllables or words, mere
segmental lengthening is not sufficient (cf. Niebuhr & Kohler 2011). It must
18 O. Niebuhr
the combined result of tides, waves, and ripples. Tides are long-term settings
in the degree of reduction determined by, for example, the communication
channel, the situation, the physiological and pathological properties of
speaker and listener and the (acoustic) environment in which their
communication takes place. Waves and ripples represent additional
meaningful or otherwise systematic (e.g., tailored to integrate the listener's
top-down processes) short-term variations along the hypo-hyper scale,
associated with phrases, words, or single sounds and syllables. This
metaphor is compatible with later refinements of Lindblom's H&H theory,
such as the smooth signal redundancy hypothesis.
Figure 2: Reframing the tug-of-war metaphor in the form of the ocean metaphor of
Bolinger (1964).
whether or not such additional distinctions are necessary will be one of the
interesting tasks of follow-up studies on speech reduction.
In fact, the ocean perspective on reduction opens up a completely new
field of questions concerning, for example, the temporal interplay
(superposition, coordination, alignment) of reduction phenomena with
similar/different wavelengths, the limits of wave amplitudes, correlations
between types of waves and wave amplitudes as well as between wave
amplitudes and the overall (sound) energy level that is fed into the wave
system, and, finally, geographical and coastal (i.e. in the case of speech
cultural and phonological) differences. These and many other questions have
the potential to stimulate, reconsider, and inspire research in speech
reduction for many more years.
Acknowledgments
First of all, I would like to thank Meghan Clayards and Meg Zellers for their
useful and insightful comments on earlier drafts of this paper. Moreover, I
am greatly indebted to Meg Zellers for taking the time to proof-read the
paper. Finally, special thanks are due to Yi Xu, Antonis Botonis and many
other participants of ExLing as well as all authors and co-editors of the
"Rethinking Reduction" volume for inspiring discussions and contributions
on the issue(s) of speech reduction.
References
Aylett, M., A.E. Turk. 2004. The smooth signal redundancy hypothesis. Language
and Speech 47, 31–56.
Bolinger, D. 1964. Around the Edge of Language. Harvard Educational Review 34,
282-293.
Browman, C.P., Goldstein, L. 1992. Articulatory phonology: An overview.
Phonetica, 49, 155-180.
Byrant, G.A. 2010. Prosodic contrasts in ironic speech. Discourse Processes 47, 545-
566.
Cangemi, F., M. Clayards, O. Niebuhr, B. Schuppler and M. Zellers (eds). in press.
Rethinking Reduction. Berlin: de Gruyter.
Clayards, M., O. Niebuhr. 2011. Production and Perception of Sibilant Assimilation:
Do French and English differ? Presentation at the Sound-to-Sense Closing
Workshop, Faculty Club Leuven, Belgium.
Clopper, C.G. and R. Turnbull. submitted. Exploring variation in phonetic reduction.
In F. Cangemi, M. Clayards, O. Niebuhr, B. Schuppler, M. Zellers (eds.),
Rethinking Reduction. Berlin: de Gruyter.
Dilley, L.C., M. Pitt. 2010. Altering context speech rate can cause words to appear
or disappear. Psychological Science 21, 1664–1670.
Docherty, G.J., J. Milroy, L. Milroy, D. Walshaw. 1997. Descriptive adequacy in
phonology: A variationist perspective. Journal of Linguistics 33, 275-310.
24 O. Niebuhr
Fowler, C. A. and J. Housum. 1987. Talkers’ signaling of “new” and “old” words in
speech and listeners’ perception and use of the distinction. Memory and
Language 26, 489-504.
Graupe, E., K. Görs, O. Niebuhr. 2014. Reduktion gesprochener Sprache -
Bereicherung oder Behinderung der Kommunikation? In O. Niebuhr
(ed.), Formen des Nicht-Verstehens, 155-184. Frankfurt: Peter Lang.
Holt, L.L., A.J. Lotto. 2010. Speech perception as categorization. Atten Percept
Psychophys 72, 1218-1227.
Hunnicutt, S. 1985. Intelligibility vs. redundancy - conditions of dependency.
Language and Speech 28, 47-56.
Junqua, J.-C. 1996. The Influence of Acoustics on Speech Production. Speech
Communication 20, 13-22.
Kohler, K.J. 1992. Gestural Reorganization in Connected Speech: A Functional
Viewpoint on "Articulatory Phonology". Phonetica 49, 205-211.
Kohler, K.J. 1999. Articulatory prosodies in German reduced speech. Proc. 14th
International Congress of Phonetic Sciences, 89-92, San Francisco, USA.
Kreidler, C.W. 1989. The pronunciation of English. Cambridge: Blackwell.
Ladd, D.R. 2008. Intonational Phonology. CUP.
Liberman, A.M. 1982. On finding that speech is special. American Psychologist 37,
148-167.
Lindblom, B. 1990. Explaining phonetic variation. In W. Hardcastle, A. Marchal
(eds), Speech production and speech modelling, 403-439. Dordrecht: Kluwer.
Local, J., J. Kelly, W.H.G. Wells. 1986. Towards a phonology of conversation:
Turn-taking in Tyneside English. Journal of Linguistics 22, 411–437.
Local, J. 2003. Variable domains and variable relevance: interpreting phonetic
exponents. Proc. TIPS, 101-106, Aix-en-Provence, France.
Mattingly, I.G. 1999. A short history of acoustic phonetics in the U.S. Proc. 14th
International Congress of Phonetic Sciences, 1-6, San Francisco, USA.
Niebuhr, O. 2008. The identification of highly reduced words by differential
segmental lengthening. Presentation at the First Nijmegen Speech Reduction
Workshop, MPI, Nijmegen, The Netherlands.
Nolan, F. 1992. The descriptive role of segments. In D.R. Ladd, G.J. Docherty
(eds.), Papers in Laboratory Phonology 2, 261–280. CUP.
Peters, B. 2005. The Database The Kiel Corpus of Spontaneous Speech. AIPUK
35a, 1-6.
Traunmüller, H., A. Eriksson. 2000. Acoustic effects of variation in vocal effort by
men, women, and children. JASA 107, 3438-3444.
Trede, D. 2011. Ist Ironie nur Prosodie? Zu lautlichen Reduktionen ironischer und
nicht-ironischer Äußerungen. BA thesis, Kiel University, Germany.
Watkins, K.E., A.P. Strafella, T. Paus. 2003. Seeing and hearing speech excites the
motor system involved in speech production. Neuropsychologia 41, 989–994.
Wright, R. 2003. Factors of lexical competition in vowel articulation. In J. Local, R.
Ogden, R. Temple (eds), Papers in Laboratory Phonology VI, 75-87. CUP.
Zellers, M. in press. Prosodic variation and segmental reduction and their roles in
cuing turn transition in Swedish. Language and Speech.
Visual search strategies and letter position
encoding in Russian
Svetlana Alexeeva
Laboratory for Cognitive Studies, St. Petersburg State University, Russia
Abstract
This article reports a visual search experiment involving Cyrillic letters of the
Russian alphabet. Results show that (1) the first and last letters of test arrays are
detected faster than neighboring letters and the letter search function looked like M-
curve; (2) letter quality influences response latencies. The results argued for parallel
letter-position encoding in Russian.
Keywords: visual word recognition, visual search task, Russian, Cyrillic script.
Introduction
Previous studies postulate that identification of letters and encoding their
positions within words are essential parts of written word recognition (for a
review, Acha and Carreiras, 2014). There are two possibilities how we can
identify letters within the words: serially (letter-by-letter) or in parallel (so-
called whole-word processing) (Coltheart, 2006). One of the methods that
help to shed the light on the low-level orthographic processing is visual
search task (Hammond and Green, 1982, Pitchford et al., 2008).
In the task, subjects are asked to decide (press the key) whether or not a
predefined target character (letter or non-letter symbol) is the part of a
subsequently presented stimulus string. The position in which the cued letter
appears in the string is manipulated and the response time is measured.
Detection latencies for each position of stimulus strings produce a search
function that is considered to reflect strategies of letter position encoding
(Ktori and Pitchford, 2010).
If the search function reveals a linear component, then it is thought that
serial processing comes into play (Pitchford et al., 2008). Usually, it means
that the letters appearing at the beginning of the word (e.g., the s and h in
shark) are identified faster than ones, appearing at the end (e.g., the r and k
in shark). If the end letter is detected faster compared with the preceding
letter (e.g., k vs. r in shark), then it is told about a parallel letter
identification (Ktori and Pitchford, 2010).
Previous studies on English show that time-position dependency in five
letter strings can be described by an upward-sloping M-form curve: the first
position is the fastest, but the reaction time in the second position is slower
than in the third one and in the fourth position it is slower than in the fifth
(Hammond and Green, 1982). The Greek language shows no latency
decrease in the fifth position compared with the fourth one (Ktori and
Pitchford, 2008). The result can be explained with the transparency of the
Greek orthography: letters in words are processed serially in the languages
with transparent orthography whereas in deep orthography languages (like in
English) parallel recognition takes place (Pitchford et al., 2008).
Grapheme-phoneme correspondences in the Russian language is quite
regular (but the reverse is not true) (Grigorenko, 2013). Therefore, we can
predict that the serial processing dominates and time-position function would
be rather line-like than an M-like curve in Russian. This paper reports a
visual search experiment in Russian which investigated this claim.
Method
Participants
50 volunteers (age range 18-35 years) participated in the study. All of them
were naive to the purpose of the experiment.
Procedure
Subjects were tested individually in a quiet room. The experiment was run
using E-prime software. On each trial, a lowercase target letter was
presented in the centre of the screen for a duration of 1000 ms, then the
blank screen followed. After 500 ms, the blank was replaced by a lowercase
test array, which remained in the centre of the screen until the response.
Participants were instructed to push the key ‘/’ if they noticed the cued letter
in a string of symbols and the key ‘z’ in the opposite case. They were
encouraged to make a decision as quickly and as accurately as possible.
Visual search strategies and letter position encoding in Russian 27
Table 1. Mean reaction times (M) [in ms] and t-test values for positive detections of
33 Russian letters (L.). Effects significant indicated in bold.
L. M t M L. t L. M t L. M t L. M. t
а 754 -2.3 ж 676 3.1 н 754 -2.4 ф 700 2.2 ы 753 -0.8
б 702 2.3 з 722 -0.7 o 662 5.4 х 699 1.0 ь 770 -2.0
в 717 -0.2 и 743 -2.5 п 740 -1.9 ц 749 -0.4 э 752 -2.9
г 733 -1.3 й 706 2.4 р 713 1.1 ч 720 -1.7 ю 721 0.1
д 703 1.8 к 751 -3.6 с 712 0.3 ш 696 2.4 я 733 -1.7
е 728 -1.0 л 746 -1.6 т 726 -0.6 щ 702 1.2
ё 629 8.3 м 741 -1.1 у 742 -1.0 ъ 714 -0.5
Acknowledgements
The project is supported by Russian Science Foundation (#14-18-02135).
References
Acha, J., and Carreiras. M., 2014. Exploring the mental lexicon: A methodological
approach to understanding how printed words are represented in our minds. The
Mental Lexicon 9. 196–231.
Coltheart. M., 2006. Dual Route and Connectionist Models of Reading: An
Overview. London Review of Education 4. 5–17.
Grigorenko. E., 2013. If John were Ivan. would he fail in reading?. in: Handbook of
Orthography and Literacy. Routledge. pp. 303–320.
Hammond. E.J. and Green. D.W., 1982. Detecting targets in letter and non-letter
arrays. Canadian Journal of Psychology 36. 67–82.
Ktori. M. and Pitchford. N.J., 2008. Effect of orthographic transparency on letter
position encoding: A comparison of Greek and English monoscriptal and
biscriptal readers. Language and Cognitive Processes 23. 258–281.
Ktori. M. and Pitchford. N.J., 2010. Letter position encoding across deep and
transparent orthographies. in: Reading and Dyslexia in Different Orthographies.
Psychology Press. pp. 69–86.
Lyashevskaya O.N. and Sharov S.A., 2009. Frequency dictionary of modern
Russian. Azbukovnik. Moscow [in Russian].
Pitchford. N.J., Ledgeway. T., Masterson. J., 2008. Effect of orthographic processes
on letter position encoding. Journal of Research in Reading 31. 97–116.
Emergence of word prosody in (Seoul) Korean
Angeliki Athanasopoulou, Irene Vogel
Department of Linguistics and Cognitive Science, University of Delaware, USA
Abstract
It has been argued that Korean has recently developed an F0 distinction word-
initially partially replacing the VOT distinction of the three stop categories, lax,
aspirated, tense. This change has been characterized as tonogenesis, but since the
contrast is not on all syllables, it seems to be more consistent with a pitch accent
language than a tone language. We investigate the prosodic patterns of trisyllabic
words to assess a) whether the VOT-to-F0 change is only word-initial or if it also
occurs in other syllables, b) if there is evidence of word level prominence on one
syllable supporting a pitch accent interpretation. The data from 10 Korean speakers
yield conflicting evidence for both tonal and pitch accent prosodic systems.
Key words: tonogenesis, VOT, pitch accent, Korean
Introduction
Korean is considered a language lacking word prosodic properties (i.e.,
stress or tone). It has recently been shown that a change is in progress,
whereby the three-way stop distinction - lax, aspirated, tense - is being
reduced to two (Silva 2006, Wright 2008, Kang 2014).Specifically, word-
initially, the VOT contrast between aspirated and lax consonants is being
replaced by high and low F0 on the following vowel, respectively. This
phenomenon is referred to as tonogenesis; however, for a language to have a
fully developed tonal system, we would expect tone contrasts to emerge not
only word-initially, but also elsewhere in the word, as for example in
Vietnamese (Haudricourt 1954, Thurgood 2002).
In the present study, we examine the acoustic properties of CV syllables
with the three consonant types in all positions in 3-syllable words to
determine, first, if there is a VOT-to-F0 change in Syllable 1, and then, if
there is evidence of such a change beyond the first syllable. Thus, the first
prediction is that word initially, the Vowels after a Lax onset (LV) would
have lower F0 than those after an Aspirated onset (AV),while the
Consonants that are considered Lax (LC) and aspirated (AC) would no
longer differ in VOT. The second prediction is that if this process is truly
tonogenetic, theVOT-to-F0 change will also be found in syllables 2 and 3.
Method
We collected a corpus of 2700 target vowels (/i, o, a/) in initial, medial and
final syllables in real trisyllabic words. The vowels appeared in syllables
with onsets that varied by consonant type, e.g., lax [pigida] ‘draw’, aspirated
[phibuʨhi] ‘relatives’, tense [p*it*agi] ‘skew’.Two types of simple dialogues
were used to elicit the target words in focus and non-focus contexts. The
target vowels appeared in the responses, as illustrated in Table 2, where
“XXX” is the word containing the relevant vowel.
Table 2. The sentences for the two focus contexts; focus = bold; target = XXX.
Results
Our findings corroborate the results of previous studies showing that word-
initially, F0 has replaced the VOT distinction between aspirated and lax
consonants. In both focus conditions, LC and AC had similar VOTs
(~50ms), but LV had a lower F0 than AV. Moreover, the tense stop (TC), as
expected, had the shortest VOT (20ms) and the vowel after the tense onset
(TV) had a mid F0, roughly between the F0 of LV and AV. In addition to the
mean F0, it is interesting to note that while the F0 contour of LV and AVis
relatively flat, the F0 of TV has a rising contour, a difference probably due
to the longer duration of TV(67ms vs. 49-55ms). The F0 and duration
properties are presented in Figure 1.
Emergence of word prosody in (Seoul) Korean 31
320
F0 Contours (Non-Focus)
Aspirated (AV)
300 ― Lax (LV)
Normalized F0 (Hz)
Figure 1. F0 contours and Duration for each onset type and syllable position.
References
Haudricourt, André-Georges. 1954. “De l'origine des tons en vietnamien.” Journal
Asiatique 242: 69-82.
Hombert, Jean-Marie. 1975. Towards a Theory of Tonogenesis: an Empirical,
Physiologically and Perceptually based Account of the Development of Tonal
Contrasts in Language. Doctoral Dissertation: University of California,
Berkeley.
Jun, Sun-Ah. 2005. “Korean intonational phonology and prosodic transcription.” In
Prosodic Typology: The Phonology of Intonation and Phrasing, by Sun-Ah Jun,
201-229. Oxford University Press.
Kang, Yoonjung. 2014. “Voice Onset Time merger and development of tonal
contrast in Seoul Korean stops: a corpus study.” Journal of Phonetics 45: 76-90.
Silva, David. 2006. “Acoustic evidence for the emergence of tonal contrast in
Contemporary Korean.” Phonology 23: 287-308.
Thurgood, Graham. 2002. “Vietnamese and tonogenesis: revising the model and the
analysis.” Diacronica 19 (2): 333-363.
Vogel, Irene, Angeliki Athanasopoulou, and Nadya Pincus. 2015. “Acoustic
properties of prominence in Hungarian and the Functional Load Hypothesis.” In
Approaches to Hungarian 14, by Katalin Kiss, Balázs Surányi and Éva Dékány,
267-292. Amsterdam: John Benjamins.
Wright, Jonathan. 2008. The phonetic contrast of Korean obstruents. Doctoral
dissertation: University of Pennsylvania.
Voice Activity Detector (VAD) based on long-
term phonetic features
Andrey Barabanov1, Daniil Kocharov2, Sergey Salishev3, Pavel Skrelin2,
Mikhail Moiseev4
1
Department of Cybernetics, Saint-Petersburg State University, Russia
2
Department of Phonetics, Saint-Petersburg State University, Russia
3
Department of Informatics, Saint-Petersburg State University, Russia
4
Intel Labs, Intel Corporation, USA
Abstract
We propose a VAD using long-term phonetically motivated features with auditory
masking, and pre-trained decision tree based classifier, which allows capturing
syllable level structure of speech and discriminating it from common noise types.
algorithm demonstrates on test dataset almost 100% acceptance of clear voice for
English, Chinese, Russian, and Polish speech and 100% rejection of stationary
noises independently of loudness with low computational cost.
Key words: Voice Activity Detector, classification, decision tree ensemble, auditory
masking, phonetic features
Introduction
The problem of low complexity accurate VAD is important for many appli-
cations in Consumer Electronics, Wearables, Smart Home and other areas,
where VAD serves as a low-power gatekeeper for a more complex and
energy consuming Automatic Speech Recognition (ASR) system.
Our VAD approach is based on the detection of signal segments with
formants in the spectrum. The method cuts off all voiceless consonants and
the majority of voiced ones. This should be compensated by considering as a
speech the sound signal that includes some unvoiced segments preceding
and following vocalized sequence. The duration of such segments is
language-dependent. On one hand, it should be long enough to contain
consonant clusters. On the other hand, it should be shorter than inter-phrase
pauses. Different languages have various consonant-to-vowel ratios and the
maximum length of consonant clusters. Thus the length of consonant
segment has to vary from language to language. The pause length is less
language-dependent and more speaker-dependent. From this point of view
the duration of consonant segment should be about 200 – 250 ms.
We propose to use long-term 200 ms speech statistics in combination
with pre-trained complex non-linear classifier, which allows capturing
syllable level structure of speech and distinguish it from common noises.
Proposed algorithm substantially outperforms competitive solutions in
Comparison
For comparison, we used two state of the art VADs: Google WebRTC VAD
and Nuance SREC VAD. For testing, we used sound files completely unused
in training. We separately performed False Accept testing on noise database
and False Reject testing on speech database with various SNRs. For false
accept test, we used DEMAND database containing background noises for
18 environments (Table 1). We conclude that new VAD outperforms
competitors. We tested false accept rate on 3 tracks of Rock, Pop, and
Classic music genres not used in training. We conclude that new algorithm
substantially outperforms competitors, still false accept on music is about
20%.
For false reject testing, we used speech database of 5 min recordings in
four languages (English, Chinese, Russian, Polish – in accordance with their
consonant coefficient), male and female speakers for each language with
manual VAD markup. Noise was synthetically added to with various SNRs
calculated as total speech to total noise power after high-pass filter with 100
Hz cutoff. We conclude that new VAD is highly accurate and language and
speaker insensitive for high SNR (up to 10 dB). We tested with various
noises (Table 2).
Voice Activity Detector (VAD) based on long-term phonetic features 35
Table 1. False accept rate comparison in % for different environmental noises and
music.
New VAD algorithm is highly accurate in car noise with FAR about 1%
at SNR 0 dB. For non-stationary noises, it demonstrates similar performance
up to SNR 10 dB and degrades for lower SNR on babble noise. This
correlates with subjective intelligibility of the speech.
Conclusion
The proposed algorithm substantially outperforms competitive solutions in
various environments and demonstrates on test dataset almost 100%
acceptance of clear voice and 100% rejection of stationary noises with 15%
complexity increase compared to MFCC based ASR front-end. The
algorithm has a latency of 200 ms, which is not acceptable for some
scenarios such as VoIP. The algorithm in some cases falsely accepts some
noises as voice: clatter of dishes; sound of flowing water; resonant strokes;
tonal beeps; babble noise; bird songs. The algorithm falsely rejects speech in
the presence of high amplitude non-stationary noise especially babble noise.
36 A. Barabanov, A. Kocharov, S. Salishev, P. Skrelin, M. Moiseev
Table 2. Proposed VAD false reject rate in % for different environmental noises and
SNR.
Table 3. False reject rate comparison in % for different environmental noises and
SNR.
References
Fant, G. 1960. Acoustic theory of speech production: With calculations based on x-
ray studies of Russian articulations.
Fastl, H., Zwicker, E. 2006. Psychoacoustics: facts and models, vol. 22. Springer
Science & Business Media.
Zhou, Z.H. 2012. Ensemble methods: foundations and algorithms. CRC Press.
The identification of two Algerian Arabic dialects
by prosodic focus
Ismaël Benali
CLILIAC-ARP, Université Paris Diderot, France
Abstract
The purpose of this research is to show that it is easier to identify the prosody of
Algiers and Oran dialects when a focus is produced. For this study, we compared
prosodic features associated with different types of focus: broad focus, emphatic
narrow focus, contrastive narrow focus and interrogative focus. It appears from the
acoustical analysis that recurrences of prosodic patterns that differentiate the two
dialects were observed in narrow and interrogative focus. The analysis of the
interaction between the identification of the two dialects and the four types of focus
showed that Algiers and Oran speakers are better identified when their utterances are
produced with narrow focus when it is placed at the edge of an intonation phrase and
interrogative focus.
Key words: dialectal variations, Algerian Arabic, intonation, focus
Introduction
Several studies have shown that dialectal varieties can be differentiated only
on the basis of prosody. The suprasegmental parameters such as speech rate,
F0 register and range, F0 excursion and F0 alignment are sufficient to
distinguish and identify dialects.
The Algiers and Oran varieties are two urban dialects of Algeria. They
are characterized by regional accents marked segmentally and prosodically.
In a previous studies (Benali, 2004), it appeared that Algiers speakers
produced more melodic variations than Oran speakers who tended to
produce more syllabic lengthening. We found also that intonation patterns
which characterize Algiers and Oran varieties are marked more clearly when
the speaker spoke with emphasis and implication. To study this
phenomenon, we compared prosodic features (mainly F0 movements)
associated with different types of focus: first, broad focus (emphasis on the
whole or a part of utterance); then, emphatic narrow focus (strong emphasis
on a specific item of an utterance); then, contrastive narrow focus (emphasis
on a contrasting item in an utterance) and finally, interrogative focus
(emphasis of a linguistic element on which the question bears).
In most languages, narrow focus is marked by F0 rise and often
accompanied by an increase of duration and intensity (Hirst and Di Cristo,
1998). In a comparison of the acoustic realizations of contrastive focus
carried on three Arabic dialects: Moroccan Arabic, Kuwaiti Arabic and
Yemeni Arabic, Yeou and al (Yeou et al., 2007) have shown that these
dialects share the same strategy in the realization of contrastive focus
consisting in a rising falling movement. This melodic contour was more
locally defined in Yemeni and Kuwaiti Arabic while it may span the entire
focused word in Moroccan Arabic. Moroccan Arabic is distinguished by a
significant effect of the syllabic structure on F0 peak alignment: It occurs
within the accented syllable when it is closed and outside when it is open. In
Kuwaiti and Yemeni Arabic, this peak occurs within but near the end of the
accented vowel either in open or closed syllable. In Egyptian Arabic, S.
Hellmuth (Hellmuth, 2011) showed an increase of F0 in focus and a
compression of it in the following words. Also in Tunisian Arabic
(Bouchhioua, 2009), focus affects positively the duration of both the stressed
syllable and the unstressed syllable. Stressed final syllables are more
lengthened and the F0 and intensity of the stressed syllable increase in effect
of focus.
Methodology
20 Algiers speakers (15 men and 5 women) and 20 Oran speakers (10 men
and 10 women) were recorded in their respective cities. There is spontaneous
and read speech. Focus was either naturally produced or provoked.
In a first experiment we isolated the prosodic information, using a
method of delexicalization by filtering speech frequency above 400 Hz.
In a second experiment we manipulated non filtered speech: we
transposed F0 variations and vowels durations of the read statements of one
dialect onto the other and vice versa. We submitted these two types of
stimuli to 30 listeners (neither from Algiers nor from Oran) who had to
identify the dialects.
The analysis and the acoustic manipulations were carried either on the
speech analysis/resynthesis program ‘WinPitch’ (Martin, 2000), or on
‘Praat’.
Results
Acoustic analysis results
Spontaneous speech
Narrow focus is marked prosodically in both dialects. Algiers variety is
characterized by rising falling contours and especially by a final melodic
drop. Oran dialect is characterized by a lengthening of stressed syllables
with lowered contours which are generally flat. F0 peak alignment is usually
on pre-nuclear syllable in Algiers dialect.
The identification of two Algerian Arabic dialects by prosodic focus 39
Read speech
The statement used in read speech is: "Ali (he) is sick." [ʕali rah mri:dˁ]; the
speakers were asked to vary the type of focus.
It appears from the acoustical analysis that recurrences of prosodic
patterns that differentiate the two dialects were observed in only two types of
focus: the emphatic narrow focus when it is at the edge of an intonation
phrase and interrogative focus. Emphatic narrow focus is produced in the
Algiers dialect by a high and falling contour on the last stressed syllable. In
the Oran dialect, this focus is realized either with a contour which is flat or
slightly rising on the last stressed syllable (figure 1). In both dialects the
stressed syllable is lengthened.
In the interrogative focus Algiers speakers produce an amplified rising-
falling contour while Oran speakers produce on the last syllable a rising
contour preceded by a falling one (figure 2). The realization of contrastive
focus varied across speakers of the same dialect. Broad focus was realized
with similar intonation patterns for both dialects.
Figure 1. Emphatic narrow focus produced by Algiers (left) and Oran speakers
(right).
Figure 2. Interrogative focus produced by Algiers (left) and Oran speakers (right).
40 I. Benali
Conclusion
The narrow emphatic focus and the interrogative focus distinguish Algiers
and Oran dialects and they are better identified in these types of focus.
References
Benali, I. 2004. Le rôle de la prosodie dans l'identification de deux parlers algériens:
l'algérois et l'oranais. Workshop MIDL.
Bouchhioua, N. 2009. Stress and Accent in Tunisian Arabic. First International
Conference on Intonational Variation in Arabic, 28-29.
Helimuth, S. 2011. Acoustic cues to focus and givenness in Egyptian Arabic.
Instrumental Studies in Arabic Phonetics, 319, 301.
Hirst, D., Di Cristo, A. 1998. Intonation systems: a survey of twenty languages,
Cambridge University Press.
Martin, Ph. WinPitch 2000: a tool for experimental phonology and intonation
research. Proceedings of the Prosody 2000 Workshop, 2000.
Yeou, M., Embarki, M., Al-Maqtari, S. 2007. Contrastive focus and F0 patterns in
three Arabic dialects. Nouveaux cahiers de linguistique française, 317.
Intonation and polar questions in Greek revisited
Antonis Botinis1, Anthi Chaida1,2, Olga Nikolaenkova3, Elina Nirgianaki1,2
1
Lab of Phonetics & Computational Linguistics, University of Athens, Greece
2
Faculty of Primary Education, University of Athens, Greece
3
Department of General Linguistics, Saint Petersburg State University, Russia
Abstract
This is a production study of intonation and polar questions in Greek. The results
indicate that there is a fairly invariable rising-falling tonal structure at the right edge
of polar questions. However, the alignment of both tonal rising and tonal peak
depend on the position of focus as well as lexical stress production. Thus, in the
context of initial and medial focus productions, the tonal rising is aligned with the
onset of the final stressed syllable whereas, in the context of final focus production,
the tonal rising is aligned with the onset of the last syllable regardless of the position
of lexical stress. On the other hand, the tonal peak is aligned with the post-stressed
syllable in the context of initial and medial focus productions whereas, in the context
of final focus production, the tonal peak is aligned with the nucleus of the last
syllable. However, the earlier the lexical stress production, the earlier the tonal rising
as well as the tonal peak in all focus contexts.
Key words: polar questions, intonation, Greek, focus, tonal associations
Introduction
Sentence type intonation in Greek shows several basic characteristics. Thus,
statements and polar questions, much like in other languages, such as Italian
and Russian, hardly have any other correlates except for intonation. Lexical
stress production in statements may be associated with a tonal rise in
prefocus position whereas, in focus position, a local tonal expansion is
followed by a postfocus tonal flattening (e.g. Botinis 1989). In polar
questions, on the other hand, there is no tonal expansion of focus application
but a rising-falling tonal characteristic at the right edge (e.g. Chaida 2010).
In a recent study (Chaida, Sotiriou, Kontostavlaki 2016, this volume), the
rising-falling tonal characteristic of polar questions at the right edge is fairly
evident, but a tonal expansion of focus application is also evident, leading us
to the question: what are the basic characteristics of polar question intonation
in Greek? In addition, particular questions are addressed with reference to
the associations of variable lexical stress as well as focus applications. In this
study, we have developed a new question-question methodology, according
to which a first wh-question elicits a polar (yes-no) question. We think that
this methodology is straightforward and may be applied to other languages
in principle, especially to languages with no other morphological and/or
syntactic means to produce polar questions but intonation.
Experimental methodology
The speech material of the present study consists of two serries of a
question-question methodological paradigm sequence. The first question is
an elicitation Wh-question, in order to assign a specific focus at the second
question, i.e. the target question, which is a polar question with variable final
lexical stress assignment at one of last three syllables (Table 1). In the first
series, the target question is a full question, whereas, in the second series, the
target question is an eliptical one, corresponding to the final prosodic word
of full question (Figures 1-4).
Five female speakers, 20-40 years old, with standard Athenian
pronunciation, produced the speech material at a normal tempo in a sound-
treated room at Athens University Phonetics Laboratory. The speakers red
the speech material from a piece of paper, first the elicitation question
followed by the target question.
The speech material analysis was carried out with Praat, with several
annotation tiers. In this report, we have concentrated on one tier, i.e. the
stressed vs. unstressed prosodic distinction, and the speech material has been
normalized with Prosody Pro tool (Xu 2013).
Table 1. Elicitation questions (left) and target questions (right) with variable final
lexical stress as well as neutral and variable focus assignments (bold letters).
1.1 [pça ðuˈlevi sti ˈmadova]? 1.1 [i ˈnana ðuˈlevi sti ˈmadova]?
‘Who works in Mantova’? ‘Nana works in Mantova’?
1.2 [pça ðuˈlevi sto miˈlano]? 1.2 [i ˈnana ðuˈlevi sto miˈlano]?
‘Who works in Milano’? ‘Nana works in Milano’?
1.3 [pça ðuˈlevi sto banaˈma]? 1.3 [i ˈnana ðuˈlevi sto banaˈma]?
‘Who works in Panama’? ‘Nana works in Panama’?
2.1 ti ˈkani i ˈnana sti ˈmadova? 2.1 [i ˈnana ðuˈlevi sti ˈmadova]?
What does Nana in Mantova? Nana works in Mantova?
2.2 ti ˈkani i ˈnana sto miˈlano? 2.2 [i ˈnana ðuˈlevi sto miˈlano]?
What does Nana in Milano? Nana works in Milano?
2.3 ti ˈkani i ˈnana sto banaˈma? 2.3 [i ˈnana ðuˈlevi sto banaˈma]?
What does Nana in Panama? Nana works in Panama?
3.1 [i ˈnana ðuˈlevi sti ˈmadova]?
Nana works in Mantova?
3 pu ðuˈlevi i ˈnana? 3.2 [i ˈnana ðuˈlevi sto miˈlano]?
Where works Nana? Nana works in Milano?
3.3 [i ˈnana ðuˈlevi sto banaˈma]?
Nana works in Panama?
Intonation and polar questions in Greek revisited 43
290
270
250
230
210
190
170
150
130
110
1 2 3 4 5 6 7 8 9 10
290
270
250
230
210
190
170
150
130
110
1 2 3 4 5 6 7 8 9 10
290
270
250
230
210
190
170
150
130
110
1 2 3 4 5 6 7 8 9 10
Acknowledgements
Thanks to Athens University S.A.R.G for economic support.
References
Botinis, A. 1989. Stress and Prosodic Structure in Greek. Lund University Press.
Castelo, J., Frota, S. Forthc. The yes-no question contour in Brazilian Portuguese.
Chaida, A. 2010. Production and Perception of Intonation and Sentence Types in
Greek. PhD Thesis, University of Athens.
Chaida, A., Sotiriou, A., Kontostavlaki, A. 2016. Intonation and polar questions in
Greek. (this volume).
Grice, M, Ladd, R., Arvaniti, A. 2000. On the place of phrase accents in intonational
phonology. Phonology 17, 143-185.
Xu, Y. 2013. ProsodyPro — A Tool for Large-scale Systematic Prosody Analysis.
Proc. TRASP 2013, 7-10. Aix-en-Provence, France.
The imprint of disposition in social interaction
Mark Campana
Dept of English and American Studies, Kobe City University, Japan
Abstract
This study considers how listeners perceive and interpret the disposition of others
through non-linguistic vocal cues. Changes in F0 and pitch span (measured against a
‘running’ mean of the previous 15 seconds), constellations of sequential tones, and
emergent speech rhythms index recognizable states of positive/negative valency,
desire, knowledge and/or processing, which together constitute emotional display
(these same states correlate with mental predicates in the composition of emotion
words. Excerpts of natural conversation were converted to ‘iterant speech’, i.e.
speech devoid of lexical content. Listeners were invited to identify speaker
disposition, and their ability to do so was remarkably accurate. The results lend
support to a theory of vocal affect based on sound-types, rather than sounds.
Keywords: disposition; emotional display; mental predicates; iterant speech
Introduction
This paper addresses the imprint of disposition in social interaction.
Disposition is taken to mean something like a frame-of-mind which both
governs the behavior of the person who has it, and is evaluated by those who
witness it. One can have a certain disposition (where certain is replaced by
an adjective) or be of a certain disposition (idem). The question we address
here is how the listener comes to realize that a speaker has (or is of) a certain
disposition based on tone-of-voice. In principle, the answer will be the same
as how the listener grasps the shifting mental states of the speaker in the
course of interaction, but there are some differences.
To illustrate the phenomena, a person can have e.g. a sunny disposition
or a surly one, be of a grumpy or a fearful disposition. Other
plausible/attested collocations are thoughtful, cheerful, kind, easy-going—
with positive valency—or angry, taciturn—with negative. One can also be
predisposed towards a proposition with positive/negative content—e.g.
judging someone harshly or with kindness.
Consider next the concept of an ‘imprint’, which is different from an
impression. An impression of e.g. a person’s character can be formed after a
single encounter. An imprint typically results from several encounters, i.e. it
includes memories of previous ones. In it, impressions are weighed and
integrated in a more substantive schema. Basically, it takes longer to make
an imprint of disposition, but in theory it can be appraised after a single
encounter.
Theory
What is tone-of-voice? First, it is ‘about’ tones (or pitches) but the array of
sounds at the disposal of the speaker has a temporal aspect, an organizational
one, and then there is the issue of voice quality. At the same time, we
understand tone-of-voice to be the audible analogue of emotion. The litany
of speech sounds in social interaction is essentially infinite. The oral cavity
alone is designed such that even minute flexions of a single muscle (or
muscle-group) can produce a complex, distinctive sound that is potentially
‘meaningful’ for the assessment of the speaker’s mental state. In terms of
efficiency, it would make sense for such sounds to be organized into sound-
types for the purpose of transmitting and understanding vocalized meaning.
Categorization is a cognitive skill at which humans (and some other species)
have proved to be adept. In this paper, we test a specific theory of sound-
types that index mental ‘sub-states’ of positive/negative valency, desire,
knowledge and processing, which together constitute an emotional display
(Wierzbicka 1999). Inasmuch as changes in perceived disposition correlate
with controlled modulation of sound-type parameters, the theory can be
verified. What then is emotion? This is not a simple question either, but we
may start by following Wierzbicka (1999) and others in assuming that most
‘emotions’ include a ‘thinking’ part, as well as a ‘feeling’ one. In her model
of semantics (NSM), words like disappointment, afraid, happiness, etc. are
cast as ‘cognitive scenarios’, short narratives made up of simple words and
propositions. Among the set of ‘mental predicates’ which play a key role in
every scenario are want, know, feel and think. Together with good, bad and
and not (also from the metalanguage) we derive the following mental states,
any combination of which can be heard in the expression of emotion itself
(abbreviated as WXYZ):
(1) Mental states (adapted from Weirzbicka 1999)
W wanting/not wanting (takes an object)
X knowing/not knowing (takes an object)
Y feeling good/bad (about something)
Z thinking (no negative counterpart)
The imprint of disposition in social interaction 47
The next step is to match the types of sounds that make up tone-of-voice
with WXYZ. This is only an approximation, whereby a given sound type is
just a ‘leading indicator’ of a mental state, not necessarily the only one.
Combinations of sounds (as well as the meaning of words) can also index a
mental state. That said, we propose that voice qualities—broadly defined—
are used to signal states of wanting or not wanting. Intensity of F0 (volume)
counts as a voice quality, along with upper partials (timbres) and non-
standard vocal gestures, such as ‘clipped’ endings, etc.
Short tunes or melodies—sequences of tones—are used to signal
knowing or not knowing. Aizuchi (backchannels) are typical: even when the
‘tune’ appears to have a single tone, it is juxtaposed against that of previous
speech. Consider what it sounds like to say “I don’t know” in your language.
Echoes of the same can be heard in longer stretches of speech as well.
Next, consider the mental states of feeling good or feeling bad. These
correspond most closely to valency, as it is known in emotion research.
Pitches and pitch combinations are primarily responsible for signaling these
states. Cook (2003) develops the idea that valency follows from three-tone
chordal structure, and there is no reason to dispute this. Emotional displays
do unfold quickly, so it is likely that even tones in sequence are perceived as
simultaneous, i.e. in the ‘psychological now’.
Finally, we propose that rhythms and timing units in general (tempos,
pauses, hesitations etc.) accurately reflect the mental activity of thinking. It
is not enough to simply demonstrate that thinking is taking place; the style
presentation and grouping of syllables is important too, influenced in part by
the choice of words.
To summarize, the mental predicates that serve to characterize emotion
words in Wierzbicka’s semantic system correspond to real mental states that
occur in the display of emotion. In theory, such states could be indicated by
facial expression, body movements (including gesture), or simply words.
Tone-of-voice is just another means of expression, where each mental
state/activity is indicated by a sound type, shown below (wxyz):
Data, methods
In the course of daily interaction, listeners can appraise the disposition of a
speaker based on tone-of-voice. Can naïve subjects reach similar conclusions
in a clinical experiment? Possibly, but not necessarily: every action depends
on individual experience, social consequences, and other factors. It isn’t
fruitful to devise an experiment along these lines. Nevertheless, listeners
may be able to recognize repeated patterns in a speaker’s voice on different
occasions, and trained ones can identify and describe them. Gathering such
data from a longitudinal study is optimal, but impractical. In the tasks
reported on here, listeners were presented with stance utterances from
speakers over a range of topics, and asked to appraise their disposition. In
order to control for word meaning though, the stance utterances were
converted to ‘iterant’ form, leaving only prosody.
In its core meaning, a stance is a physical event whereby the stance-taker
assumes a bodily position that signals a clear intention to the audience. One
can easily imagine how something like ‘defiance’ is acted out by assuming a
defensive posture. In current sociolinguistics, the concept of stance has been
extended to talk-in-interaction. Many researchers refer to the seminal work
of DuBois (2007), who proposes that every stance has a subjective
dimension (i.e. about the speaker), an objective one concerning the person or
thing being evaluated, and an intersubjective dimension which pertaining to
the social relationship between speaker and hearer. He refers to this as the
“stance triangle”. A stance utterance encapsulates the stance, and can be
regarded as its core element. Stance utterances make good objects for study
because a) they are usually short and succinct, and b) they tend to summarize
a speaker’s story or narrative. Typical stance utterances might be “I’m sorry,
but that’s not exactly what I had in mind”, “There’s a reason why we do
this”, or “I don’t even know if that’s enough” (emphasis added). Further
examples are given below, with purported effects (punctuation omitted):
(3) Typical stance utterances (all negative valency) TOPIC
a. The worst is yet to come [global warming]
b. Hillary (Clinton) does not inspire confidence [politics]
c. Frankly, I can’t understand how people put up with this [migration]
d. The Internet hasn’t enriched my life in any significant way [modern life]
e. Keeping up relations takes a lot of work [social obligations]
f. Every day I eat the same thing [food]
Judgements of disposition are based on tone-of-voice as well as words,
however. In order to test for it, it is necessary to expunge all lexical content.
Nooteboom (2000) suggests using ‘iterant’ speech, that is substituting
nonsense syllables for words, thus preserving prosodic features. At present
this can only be done by humans, and is most effective when the forms are
The imprint of disposition in social interaction 49
Discussion
The results of these tests were predictable. Subjects could easily determine
valency based on their choice of terms to describe perceived disposition, e.g.
50 M. Campana
References
Cook, N. 2003. Tone of Voice and Mind. John Benjamins Publishing Co.,
Amsterdam.
Crystal, D. 1975. The English Tone of Voice. Edward Arnold, London.
DuBois, J. 2007. The stance triangle. In Stancetaking in Discourse. R. Englebretson
(ed.), John Benjamins Publishing Co., Amsterdam.
Laver, J. 1980. The Phonetic Description of Voice Quality. CUP.
Nooteboom, S. 2000. The prosody of speech: Melody and rhythm’. MS, Research
Institute for Language and Speech, Utrecht.
Wichmann, A. 2000. The attitudinal effects of prosody and how they relate to
emotion. Proc. of ISCA Workshop on Speech and Emotion; Cowie, R., E.
Douglas-Cowie, & N. Schroder (eds.)
Wierzbicka, A. 1999. Emotion Across Languages and Cultures. CUP.
Intonation and polar questions in Greek
Anthi Chaida, Angeliki Sotiriou, Athina Kontostavlaki
Laboratory of Phonetics and Experimental Linguistics, University of Athens, Greece
Abstract
The present study focuses upon the effects of lexical stress and focus on Greek polar
(yes/no) questions. According to the results of a production experiment, the tonal
structure of neutral questions presents striking similarities with the tonal structure of
questions with focus on the final element. Questions with focus in the first element
display a different tonal structure and do not show the typical F0 fall on the stressed
syllable of the nucleus. The peak of the tonal boundary in these questions aligns with
the last stressed syllable, while in neutral questions and in questions with focus in
the final element it aligns with the last syllable of the utterance.
Key-words: intonation, polar questions, lexical stress, focus, Greek.
Introduction
This study aims to investigate the interaction of lexical stress and focus with
the intonation of polar (yes/no) questions in Greek. Although different
sentence types and specifically polar questions as well as focus have been
the objective of several studies (e.g. Chaida 2010), the effect of the position
of lexical stress on tonal contours, and especially on tonal boundaries, still
remains an open question (see Botinis et al. 2016). According to previous
studies, the tonal structure of polar questions consists of a low nuclear tone,
followed by a risin-falling tonal movement at the right edge of utterances.
More specifically, the tonal peak has been found to align either with the last
syllable of the sentence when focus in the last word or with the last stressed
syllable when focus earlier (Grice et al., 2000, Arvaniti 2002, Baltazani 2007,
Chaida 2010).
Experimental methodology
One simple sentence was crossed with 3 focus renditions (no focus, focus on
the first element, focus on the final element), and 3 lexical stress placements
on the final element (Table 1). The speech material was placed in 3 lists with
random order, and was produced by 10 female speakers aged 20-40 years old
with standard Athenian pronunciation. The speakers were given verbal
instructions and provided with contextual information and a suggested
answer for every question. The total corpus of the recorded utterances
consisted of 270 utterances (3 sentences X 3 focus renditions X 10 speakers
Χ 3 repetitions).
The speech productions were directly recorded into a computer hard disk
at the isolated sound recording booth of the Laboratory of Phonetics and
Computer Linguistics of the University of Athens. The speech material was
analyzed with Praat software, and the relevant data were automatically
generated through the script Prosody Pro (Version 5.6.0) (Xu 2013). MS
Excel and a Python script were used for the creation of graphs.
Table 1. Speech material of polar questions used for recordings, based on 3 different
lexical stress placements (ˈ)crossed with 3 focus renditions (in bold)
Figure 1a. Intonation of polar questions Figure 1b. Intonation of polar questions
with stress on the antepenultimate with stress on the penultimate syllable in
syllable in 3 focus renditions. 3 focus renditions.
Figure 2a. Intonation of polar questions Figure 2b. Intonation of polar questions
without focus, with 3 lexical stress with focus as well as 3 lexical stress
placements on the final element. placements on the final element.
Figure 3a. Intonation of polar questions Figure 3b. Intonation of polar questions
with focus on the first element and 3 with focus on the first element and 3
lexical stress placements on the final. lexical stress placements on the final
(Realisation A, 62% of the utterances). (Realisation B, 38% of the utterances).
References
Arvaniti, A. 2002. The intonation of yes-no questions in Greek. In M. Makri-
Tsilipakou (ed.), Selected papers on theoretical and applied linguistics 71-83.
Thessaloniki: Aristotle University.
Baltazani, M. 2007. Intonation of polar questions and the location of nuclear stress
in Greek. In Carlos Gussenhoven & Tomas Riad (ed.), Tones and tunes,
Volume II: Experimental Studies in Word and Sentence Prosody (387-405).
Berlin: Mouton de Gruyter.
Botinis, A., Chaida, A., Nikolaenkova, O., Nirgianaki, E. 2016. Intonation and polar
questions in Greek revisited. Proc. ExLing 2016 (this volume).
Botinis, A. Granström, B., Möbius, B. 2001. Developments and paradigms in
intonation research. Speech Communication 33 (4), 263-296.
Chaida, A. 2010. Production and Perception of Intonation and Sentence Types
in Greek. PhD Thesis, University of Athens.
Grice, M, Ladd, R. & Arvaniti, A. 2000. On the place of phrase accents in
intonational phonology. Phonology 17, 143-185.
Xu, Y. 2013. ProsodyPro — A Tool for Large-scale Systematic Prosody Analysis.
In Proceedings of Tools and Resources for the Analysis of Speech Prosody
(TRASP 2013), Aix-en-Provence, France.7-10.
Contextual predictions and syntactic analysis: the
case of ambiguity resolution
Daria Chernova, Veronika Prokopenya
Laboratory for Cognitive Studies, St. Petersburg State University, Russia
Abstract
We test the hypothesis that syntactic analysis is based on contextual predictions and
is guided by discourse salience of the referents. The head of the complex noun
phrase tends to be more prominent in discourse as native speakers expect the
continuation of the story to refer to N1 more often than to N2. It corresponds to the
data on adjunct attachment interpretation.
Key words: contextual prediction, syntactic analyses, referent activation
Introduction
The problem of syntactic ambiguity resolution is widely discussed in
psycholinguistics being a testing ground for different parsing models
(Traxler 2014). The question is what guides the choice of the interpretation
when grammar allows several possible variants.
Adjunct attachment ambiguity (I met the servant of the countess that
was on the balcony) is particularly widely discussed cross-linguistically as
different preferences in different languages contradict the idea of
universality and are inconsistent with the Late Closure Principle (Cuetos &
Mitchell 1988, Grillo & Costa 2014). Previous studies (Sekerina 2003
Yudina et al. 2007), show high attachment preference for Russian.
We test the hypothesis that syntactic analysis is guided by discourse
salience of the referents and is based on contextual predictions (Rohde et al.
2011). We presuppose that the listener/the reader expects further information
about a more salient (activated) referent (Chafe 1994). The referent which is
mentioned in a story-continuation task more often is more discourse salient
and thus is more likely to attract the adjunct.
Method
Materials and design
12 experimental stimuli were constructed for fill-in-the-blank task. Each
stimulus consisted of two sentences, the first sentence contained a complex
noun phrase and the second sentence was the continuation of the first one
and could refer either to N1 or N2 equally plausible (as in (1)). N1 and N2
had the same number, gender and animacy. The subject of the second
sentence was omitted and substituted by a gap which the participants were
asked to fill by any appropriate word.
The questionnaire also included 62 fillers which contained no ambiguity.
‘I met the servant of the countess in the street. For many years
___________ lived nearby’
Results
The gap was filled with one the following variants: N1 or its periphrasis, N2
or its periphrasis, a noun which could refer both to N1 and N2, 3rd person
pronoun or any other word (see Table 1).
Discussion
From our data we can draw two main conclusions:
Acknowledgements
The study was supported by grant from Russian Humanitarian Scientific Fund, #14-
04-00586
References
Chafe, W. 1994. Discourse, consciousness, and time: The flow and displacement of
conscious experience in speaking and writing. Chicago, University of Chicago
Press.
Sekerina, I. 2003. The Late Closure Principle in Processing of Ambiguous Russian
Sentences. In: The Proceedings of the Second European Conference on Formal
Description of Slavic Languages. Potsdam: Universität Potsdam.
58 D. Chernova, V. Prokopenya
Abstract
The present study examines acoustic manifestations of the vocal fatigue in three
groups of voice professionals (pronunciation teachers, professional speakers and
tourist guides) who seem to be particularly susceptible to vocal loading. In the paper
data collecting and the non-fatigue/fatigue speech corpus are described. The detailed
acoustic analysis of the data obtained is presented. The results of the acoustic
analysis showed a consistent dependency between acoustic parameters and vocal
fatigue in terms of F0, jitter and shimmer values. The results can contribute to
objective voice examinations and automatic voice pathology detection.
Key words: vocal fatigue, acoustic analysis, voice professionals, speech corpora
Introduction
Vocal fatigue is a voice disorder which particularly concerns professional
voice users and can lead to serious pathological conditions Teachers, singers,
actors, guides and all types of professional speakers that require prolonged
voice use are identified as an at-risk group for developing vocal disorders.
The symptoms of vocal fatigue are various and explained by the physiologic
mechanisms of vocal production. There exist many studies on vocal fatigue
providing various concepts of the phenomenon. However, there is no
universally accepted definition. It can be viewed either as a voice disorder
caused by other pathological voice conditions or as a separate voice problem
resulting from prolonged and excessive voice use [10]. In this study the
vocal fatigue is understood as a separate phenomenon caused by excessive
professional voice load which results in auditory perceptual and acoustic
changes in the voice signal and can lead to serious pathological conditions.
The present study paper is aimed to describe the data collecting for the non-
fatigue/fatigue speech corpus and to present the results of acoustic analysis.
Methods
The methodologies that attempt to induce vocal fatigue in experiment
participants vary across numerous works on the vocal fatigue [1-9]. In most
studies the vocal fatigue is induced artificially as a result of reading or
speaking tasks of various types. The results described are inconsistent and
often conflicting. The conditions of our experiment seem to be more
Results
We calculated (in Praat) a number of acoustic parameters based on formant
values, jittter, shimmer, pitch and loudness which can help detecting the
absence/presence of voice fatigue in a given speech sample. The parameters
which seem to be most important for automatic detection are the mean value
of F0, jitter and shimmer values.
The calculations showed that the main tendency for both male and
female speakers was the increase in the mean value of F0 in the fatigued
speech across all the speaker groups. However, the jitter values become
lower. As to the shimmer value, there can be seen the decrease in fatigued
female voices and the increase in fatigued male voices. The tables 1-3 below
show the results.
Jitter
local,
absolute (seconds) rap % ppq5 % ddp %
Female non-fatigue 2,283 0,00011 1,002 1,051 3,008
fatigue 2,208 0,008578921 0,97 1,036 2,91
Male non-fatigue 3,239 0,000272208 1,273 1,421 3,82
fatigue 2,888 0,000228958 1,085 1,229 3,254
All non-fatigue 2,556 0,000156442 1,08 1,157 3,24
fatigue 2,403 0,006036776 1,003 1,091 3,008
Shimmer
local, db dda %
local % (dB) apq3 % apq5 % apq11 %
Female non-fatigue 8,022 0,833 2,653 4,068 7,871 7,96
fatigue 8,108 0,837 2,666 4,168 8,008 7,998
Male non-fatigue 11,003 1,063 3,777 5,775 12,18 11,33
fatigue 10,377 1,015 3,521 5,387 11,28 10,56
All non-fatigue 8,874 0,898 2,974 4,556 9,103 8,923
fatigue 8,756 0,887 2,91 4,516 8,943 8,731
Conclusions
The results of the voice acoustic analysis of the fatigued speech in
comparison with the non-fatigued speech showed a consistent dependency
between acoustic parameters and vocal fatigue. The parameters which are
affected by the vocal fatigue are the F0, jitter and shimmer values, the
duration and number of pauses. The differences in the acoustic parameters
before and after vocal loading mainly seem to reflect increased muscle
activity as a consequence of excessive vocal loading.
The results can contribute to objective voice examinations and automatic
voice pathology detection.
62 K. Evgrafova, V. Evdokimova, P. Skreliv, T. Chukaeva
References
Boucher, V.J. 2008. Acoustic Correlates of Fatigue in Laryngeal Muscles: Findings
for a Criterion-Based Prevention of Acquired Voice Pathologies. Journal of
Speech, Language, and Hearing Research, vol. 51, 1161–1170.
Caraty, M.J., Montacié, C. 2010. Multivariate Analysis of Vocal Fatigue in
Continuous Reading, Proceedings of Interspeech 2010, 470-473.
Kostyk, B.E., Rochet, A.P. 1998. Laryngeal airway resistance in teachers with vocal
fatigue: a preliminary study. Journal of Voice, vol. 12, 287–299.
Sala, E., Airo, E., Olkinuora, P. et al, 2002. Vocal Loading among Day Care Center
Teachers”. Logoped Phoniatr Vocol,vol. 27, 21–28.
Schneider, B. 2006. Effects of Vocal Constitution and Autonomic Stress-Related
Reactivity on Vocal Endurance in Female Student Teachers. Journal of Voice,
vol. 20, No. 2, 242–250.
Scherer, R.C., Titze, I.R. et al. 1986. Vocal fatigue in a professional voice user. In
Transcripts of the Fourteenth Symposium: Care of the Professional Voice, New
York: The Voice Foundation, pp.124–130.
Scherer, R.C., Titze, I.R. et al. 1991. Vocal fatigue in a trained and an untrained
voice user. Laryngeal Function in Phonation and Respiration, San Diego,
Singular Publishing Group, pp. 533–555.
Titze, I. , Lemke, J., Montequin, D. 1997. Populations in the U.S. workforce who
rely on voice as a primary tool of trade: a preliminary report. Journal of Voice,
vol. 11, 254–259.
Laukkanen, A.M. 1995. On speaking voice exercises. PhD dissertation, Acta
Universitatis Tamperensis, ser A, vol. 445, Tampere: University of Tampere.
Creating a subcorpus of a heritage language on
the example of Yiddish
Valentina Fedchenko1, Ilia Uchitel2
1
Department of Jewish Culture, St. Petersburg State University, Russia
2
School of linguistics, Higher School of Economics in Moscow, Russia
Abstract
The paper presents a Yiddish heritage subcorpus on the basis of the Corpus of
Modern Yiddish. The contemporary status of the Yiddish language and the absence
of monolingual speakers nowadays makes it perfect candidate for research within
the framework of heritage languages. Yiddish exists in different sociolinguistic
contexts and forms plenty of bilingual pairs. Corpus-linguistic approach, especially
in corpora with multimedia utilities and L2 component, enlarges the variety of
possible instruments and subjects of research. The paper discusses practical issues of
creating and using a multimodal corpus of the Yiddish language with a special focus
on the more recently added subcorpus of recorded interviews with L2 speakers of
Yiddish, while analyzing the corpus architecture, the corpus representativity, L2
corpus marking.
Key words: corpus linguistics, Yiddish, multimedia corpus, L2 corpus, heritage
language.
Corpus data
One of the most valuable projects made during recent years are Corpus of
Modern Yiddish (http://web-corpora.net/YNC/search/) and Yiddish
Multimedia Corpus (http://web-
corpora.net/YiddishMultimediaCorpus/search/) . First one includes
documents representing language of press and fiction of the XIXth till late
XX centuries, including modern documents. Second corpus presents
annotated audio records of authentic Yiddish speech, with speakers coming
from various dialect areas. It includes 10 files: lectures and field recordings.
With online search available, these sources give a great possibility for
performing quantitative studies of Yiddish language, as well as learning
Yiddish as second language.
CMY contains currently 4 150 933 tokens from 3662 documents. The
largest part of it is press with a share of 78.43% is mostly represented by
archive of “Forverts” newspaper, publishing in the US, with issues dating
from 2004. Some of the press text authors are not native speakers, or, to
some extent, are heritage speakers.
Therefore, the first step for construction of heritage and L2 subcorpus is
looking for sociolinguistic characteristics of the authors. Such characteristics
should include, at least, type of language knowledge
(native/heritage/second), first language (if appreciable), year of birth of the
speaker. Some additional information, however, would enrich the set: for
Creating a subcorpus of a heritage language on the example of Yiddish 65
Acknowledgements
The authors thank for financial support the Russian Science Foundation (project 15-
18-00062, St. Petersburg State University).
References
Avineri, N. R. 2012. Heritage language socialization practices in secular Yiddish
educational contexts: the creation of a metalinguistic community. Disertation in
Applied Linguistics, University of California.
Klyachko E., Arkhangelskiy T., Kisselev O., Rakhilina E. 2013. Automatic error
detection in Russian learner language. Corpus Linguistics. Lancaster, UK, July
22–26, 2013.
Levine, G. S. 2000. Incomplete L1 acquisition in the immigrant situation: Yiddish in
the United States. Tübingen, Niemeyer.
Sadan, T., 2011. Yiddish on the Internet. Language and Communication, 31(2), 99-
106.
Safadi, M. 2000. Yiddish: its survival in an English-dominant environment.
Dissertation, University of California.
Shandler, J. 2008. What is American Jewish Culture? In Raphael, M. L. (ed.), The
Columbia History of Jews and Judaism in America, 337-365. Columbia
University Press, New York.
Affricates in the spontaneous speech of
Aromanians in Turia
Anastasia V. Kharlamova
General Linguistics Department, Saint-Petersburg State-University, Russia
Abstract
This paper deals with the affricate inventory of Aromanian spontaneous speech,
using the spoken materials collected in Turia (Greece) in 2002 for the Small
Dialectological Atlas of the Balkan Languages. The purpose is to analyse the
affricates present in the Turia Aromanian dialect and their development. The texts,
which had been previously put down in Romanian-based Aromanian orthography
with the help of a native Aromanian speaker, were transcribed using computer
programs Sound Forge and Speech Analyzer. The instrumental analysis shows that
there are eight affricates to be found in our materials: [t͡s], [d͡z], [t͡’s’], [d͡’z’], [t͡ʃ],
[d͡ʒ], [t͡ɕ], and [d͡ʑ]. However, there is evidence of these sounds – most notably [t͡s]
and [d͡z] – being in the process of losing their stop phase. On the other hand, there
are also instances of an opposite process, namely a fricative phase appearing after
[t]. Both processes are fairly well-known typologically and among the Indo-
European languages, but there has been previously little to no research on affricate
development in Aromanian.
Key words: instrumental phonetics, Aromanian language, Turia, affricates, stops
Turia Aromanian
Kranea (Greek), or Turia (Aromanian), is a village with a population of circa
600, located in the Pindos Mountains in Greece, on the border between the
administrative districts of Western Macedonia and Thessalia (Бара и др.
2005: 16). The inhabitants of the village identify themselves as Greeks, but
call themselves “Vlachs” (Βλάχοι), and their language limba noastrā ‘our
language’, vlāhești ‘Vlach’, and armānești ‘Aromanian’. There is a
widespread opinion among them that Aromanian can’t be written (Бара и
др. 2005: 17).
The Turia variety of Aromanian is given a full and highly detailed
description in (Бара и др. 2005). We shall here only summarize the phonetic
characteristics of this dialect.
It represents many of the chief features of Southern Aromanian dialect
zone, among them the reduction of non-accented /e/ and /o/ into /i/ and /u/,
the occurrence non-syllabic /u/ and /i/ after final consonants, syncopes, etc
(Бара и др. 2005).
The recordings of the Turia Aromanians' spontaneous speech that we
used in our research are available on a CD attachment to (Бара и др. 2005).
The list of speakers is given in (Бара и др. 2005: 20-22). We mostly used the
recording of the speech of Anastasia Pissoni (born in Turia in 1931,
housewife).
The first transcription of the analyzed texts had been made by M. Bara,
one of the authors of (Бара и др. 2005), herself a speaker of Aromanian.
However, it was based chiefly on her language intuition, and therefore often
reflects her interpretation of the sounds rather than what really was recorded.
Our own transcription was made with the help of two programs
developed for phonetic and acoustic research – Sound Forge and Speech
Analyzer. Sound Forge was used for building oscillograms and writing down
the transcription, while Speech Analyzer was used for spectrograms.
Affricated [t]
There are several clear cases of affrication of [t] in the analysed data: 3
occurrences of full affrication and 10 appearances of an audible fricative
phase. All of them are recorded before front vowels.
In (Kümmel 2007) this process is mostly found in reconstructions.
However, there are two well-known and notable examples of [t]-affrication,
one of them occurring in late Latin and influencing the whole subsequent
Romance group, and the other being one of the results of the High German
consonant shift.
Therefore, although affrication is not as widely spread as loss of stop
phase, it is still not a rare process typologically. Most importantly, it has
already taken place once in the history of the Romance languages.
70 A.V. Kharlamova
Future research
Our main perspectives for future research include: first, observation of this
dialect’s development over the years; second, collection of spontaneous
speech data from other Aromanian dialects; third, use of our knowledge of
contacts of Aromanian with other languages to better describe and predict its
language changes.
References
Berns, J. 2014. A Typological Sketch of Affricates. Linguistic Typology, 18 (3).
Capidan, Th. 1932. Aromânii. Dialectul aromân. București, Imprimeria națională.
Gołąb, Z. 1984. The Arumanian dialect of Kruševo in SR Macedonia, SFR
Yugoslavia. Skopje, Macedonian Academy of Sciences and Arts.
Kümmel M. 2007. Konsonantenwandel: Bausteine zu einer Typologie des
Lautwandels und ihre Konsequenzen für die vergleichende Rekonstruktion.
Wiesbaden, Reichert Verlag.
Meyer-Lübke, W. 1890. Grammatik der Romanischen Sprachen, Bd. 1. Leipzig,
Fues’s Verlag.
Nedelkov, J. 2009. The Ethnic Code of the Vlachs at the Balkans.
EthnoAnthropoZoom 6, 221-253.
Papahagi, T. 1974. Dicționarul dialectului aromân general și etimologic. Ediția a
doua augmentată. București: Editura Academiei Republicii Socialiste România.
Rothe, W. 1957. Einführung in die historische Laut- und Formenlehre des
Rumänischen. Tübingen, Max Niemeyer Verlag.
Бара М., Каль Т., Соболев А. Н. 2005. Южноарумынский говор села Турья
(Пинд). München, Biblion Verlag.
Нарумов, Б. П. 2001. Арумынский язык/диалект. In Жданова Т. Ю. и др. (ред.)
2001, Языки мира. Романские языки, 636–656. Москва, Academia.
Харламова, А. В. 2015. Опыт фонетического анализа арумынской спонтанной
речи. In Чердаков Д. Н. (ed.), XVIII Международная конференция
студентов-филологов. Тезисы докладов. Санкт-Петербург,
Филологический факультет СПбГУ.
L1 transfer, definiteness and specificity of
determiners in L2 English
Sviatlana Karpava
University of Central Lancashire, Cyprus
Abstract
This study investigates L1 transfer from Cypriot Greek (CG), definiteness and
specificity of determiners in L2 English. 100 CG undergraduate students (ages 17-
23) participated in the study. The linguistic (socio-economic) background
questionnaires were used. Their written corpus (100 essays) was analysed in terms
of determiner production. They were also offered an elicitation task based on Ionin
et al. (2003, 2004), which was focused on elicitation of definite determiner the in
[+def; +spec] and [+def; ‒spec] environments and indefinite determiner a in [‒def;
+spec] and [‒def; ‒spec] environments. The results of the study showed that the
most problematic condition for CG students was [‒def; +spec] with target indefinite
determiner as they fluctuated in their written production between target and non-
target settings.
Key words: determiners, definiteness, specificity, L1 transfer
Introduction
It was found that L2 English acquisition of articles is a very difficult process
(Huebner, 1983; Master, 1987; Parrish, 1987; Robertson, 2000; Leung,
2001; Ionin et al., 2008). L2 leaners make omission or substitution errors
(Larsen-Freeman, 1975; Thomas, 1989; Parodi et al, 1997; Hawkins et al.,
2006). L2 learners either have access to Universal Grammar (UG), directly
or via their L1, which is in line with the domain-specific view of L2
acquisition, or they use general learning mechanisms such as statistical
learning, which is in line with the domain-general view (Ionin et al., 2008).
Definite articles are presuppositional expressions, while indefinite
articles are quantificational expressions, as for the latter there is no prior
presupposition or mentioning (Heim, 1991). In English, definite article the
presupposes that the referent has been established by prior knowledge or
discourse and this knowledge is shared by both a listener and a speaker
(Ionin, 2003, 2006). Learning of articles involves form-meaning mapping.
Definiteness is one of the cross-linguistic semantic universals, the other is
specificity. L2 learners have access to both universals and they fluctuate
between them. Ionin et al. (2003, 2004, 2008) observed that L2 learners of
English have more accurate performance on [+def; +spec] and [‒def; ‒spec],
when there is agreement between definiteness and specificity, than on [+def;
‒spec] and [‒def; +spec], when the two universals are in conflict. English
Study
100 CG undergraduate students (ages 17-23, L2 proficiency: beginners,
intermediate and advanced) participated in the study. The linguistic (socio-
economic) background questionnaires were used. Their written corpus (100
essays) was analysed in terms of determiner production. They were also
offered an elicitation task based on Ionin et al. (2003, 2004), which was
focused on elicitation of definite determiner the in [+def; +spec] and [+def;
‒spec] environments and indefinite determiner a in [‒def; +spec] and [‒def;
‒spec] environments. The participants were offered to choose from three
options each time (the, a or Ø), there were 10 items for each condition. The
task also investigated whether L2 learners of English transfer from L1 and
they were asked to choose the appropriate variant (the, a or Ø) in such
semantic and syntactic environments, where CG and English differ in terms
of article use (Holton et al., 2004; Buschfeld, 2013). There were also
distractor items focused on the use of various tenses.
References
Buschfeld, S. 2013. English in Cyprus or Cyprus English? An Empirical
Investigation of Variety Status. Amsterdam: John Benjamins.
Heim, I. 1991. Artikel und Definitheit. In Stechow, A and Wunderlich, D. (eds.),
Semantik: Ein internationales Handbuch der zeitgenossischen Forschung, 487-
535. Berlin: de Gruyter.
Ionin, T. 2003. Article Semantics in Second Language Acquisition. Unpublished
doctoral dissertation, MIT.
Ionin, T. 2006. This is definitely specific: specificity and definiteness in article
systems. Natural Language Semantics 14, 175-234.
Ionin, T., Ko, H. and Wexler, K. 2004. Article semantics in L2-acquisition: the role
of specificity. Language Acquisition 12, 3-69.
Ionin, T., Zubizarreta, M.L. and Maldonado, S.B. 2008. Sources of linguistic
knowledge in the second language acquisition of English articles. Lingua 118,
554-576.
Trenkic, D. 2000. The acquisition of English articles by Serbian speakers.
Unpublished PhD dissertation. University of Cambridge.
Writing-based wordforms vs. spoken wordforms
Vadim Kasevich1, Iuliia Menshikova2
1
Faculty of Asian and African Studies, SpbU, Russia
2
Faculty of Philology, SpbU, Russia
Abstract
This study addresses a very important problem of reshaping Russian Grammar in
conformity with its real acoustic realization rather than with the traditionally written
expression plane. In this way, one can switch from an absolutely abstract coding of
wordforms to acoustic entities, first phonological and then phonetic, which underlie
the real processes of speech production and speech perception. The multiple
approach to grammar writing makes it necessary to develop a special database for
the phonologically represented wordforms of Russian. Typically, the respective
paradigms are reduced. More generally, the links camouflaged by the traditional
orthography are made visible. E.g., the Adjective Gender paradigm, normally made
up of three genders, is reduced to a two-item paradigmatic structure, because Neuter
Gender and Feminine Gender just merge.
Key words: linguistics, Russian language, grammar, phonetics, morphemics.
Introduction
Natural language grammars as we know them may differ in many ways
depending on the theories that underlie them. However different, the vast
majority of the existing grammars share at least three important things in
common, viz. (i) practically all of them are designed to account for the
formal structure of the language rather than for its functioning, (ii) even
where the grammars somehow model the dynamic nature of the language,
the sets of rules are typically intended for the speakers rather than for the
hearers, (iii) most grammars present their paradigms etc. in terms of standard
orthography rather than in terms of phonological representations.
Unlike the prevailing tradition referred to above, we choose an approach
where the grammar (of Russian) is modelled as a set of rules designed for
the hearer. Since the hearers operate with sound patterns of linguistic entities
the expression plane of the entities is expected to be presented in terms of
the phonemes. E.g. the wordform КУПАТЬСЯ (kupat’s’a) ‘bathe’ is
normally written with the so-called particle –СЯ (–s’a). However, if we
switch to its sound shape, we find that the hearer must be prepared to
recognize, in addition to /kupal-s’a/ ‘bathed’, also /kupal’i-s’/ ’[they]
bathed’, and /kupac-ca/ ‘[to] bathe’. In many cases, the phonology-based
representation reshapes the paradigm as compared with its writing-based
version, cf. НОВОЕ Neuter ‘new’ and НОВАЯ ‘new’ Feminine which just
merge in /novaja/.
Methods
As our goal is to “redress” Russian morphology in such a way that its
expression plane would be consonant with the phonology, our first step is to
provide all the (nominal) wordforms of the Russian lexicon with a
phonological transcription. E.g., ОДЕЯЛ-О ‘blanket’ → /ad’ijál-a/. To make
sure that our phonological transcription faithfully reproduces the expression
plane of the wordforms chosen, our Ss. were asked to filter out the output of
the transcribing routine.
The second step is developing a database for nouns where all the relevant
information about individual nouns would be stored (see Kasevich et al., this
volume).
Another situation is met where there is a wide gap between writing and
sound systems. If we compare, say, Russian and English, we will see that the
Russian writing system is relatively simple and systematic, while the English
system is notorious for its very unsystematic, sometime extravagant,
relationship between writing and sound. This means that the analyst will be
confronted with very different tasks depending on the language.
It is also interesting to study the sound-writing relation from the point of
view of how writing reflects diachronic shifts. For Russian, it could be
hypothesized that, at least in some cases, the reduction phenomena described
above synchronically recapitulate diachronically important development
(like Weak Vowel Drop, etc.).
Finally, a few more words about our problem from the applied linguistics
perspective could be added. Stripping the wordform of its writing ‘dress’ is
not the end of the story, although it is surely a prerequisite to writing
computer programs for automatic speech perception and speech production.
A phonologically transcribed speech, especially when it is a piece of the
fluent text, is still very far from the real acoustic speech signal with all its
redundancy on the one hand and imperfections and missing portions on the
other. It is quite typical to be exposed to a speech signal so impoverished
that only a good deal of guesswork makes an adequate perception possible.
There is one more very important problem that cannot be neglected,
given the goal of our study. We mean the prosodic (here accentual)
characteristics which are indispensable for any wordform of Russian. It has
been demonstrated in lots of experiments that the lexical stress (accent) is an
indepedent parameter in speech perception. According to our findings, quite
typical is the situation where accent recognition scores are much higher than
those for the phonemes or syllables. It is much likely that the overall
language system contains a separate, relatively independent prosodic
subsystem. This subsystem comes into play first in speech perception and in
language acquisition, too, the stress strategies are well developed even prior
to all the other subsystems.
Here again, typological aspects are also essential. To begin with, there
exist languages, like Mongolian, where they have no lexical accent (stress) at
all (vowel harmony being a partial functional substitute). No statistics are
available, but it seems safe to argue that the number of unaccentual (lacking
lexical stress) languages are much less. However, if we turn to standard
written texts, where no accents are shown, we will see that the two language
types discussed above (with and without stress) become very much closer.
Within one language as well as cross-linguistically, various subsystems and
compensatory strategies are used to achieve an approximately the same level
of efficiency both in perception and production, writing being one of the
factors in play.
78 V. Kasevich, I. Menshikova
Writing to some extent makes obscure the real number of the homonyms
to be found in the language. According to our data, in Russian one finds
more than four thousand words which are written the same but differs due to
different positions of the stressed syllable, e.g. L'UBIM ~ L'UBIM' '[we]
love ~ '[he is] loved'. These are, so to speak, writing-made homonyms
although 'in reality' they are a clear case of minimal pairs.
In some cases, the writing/spoken dichotomy may determine the very
deep typological features making the language typologically the way it is.
According to a witty observation of Professor EugenyJakhontov, Semitic
languages are typologically close to the isolating class when the languages
are written, but acquire most features of inflexional languages when the
languages are spoken, The thing is that in Semitic languages the so-called
schemata whose function is to express grammatical meanings are not
"visible" when written, that is KiTaB 'book' and uKTub 'write' where KTB is
a root, i-a and u-u schemata, are reduced to writing in the same way.
Of cause, it is a comforting idea to believe in the unique grammar for
each language, our duty being to discover it. In reality, the situation is much
more complicated and the written word/spoken word dichotomy adds a lot to
its complexity.
References
Baudouin de Courtenay J.A. 1912. On the Relation of Russian Writing to the
Russian Language. In Baudouin de Courtenay J.A. Selected Works on General
Linguistics. Vol. 2, 209-235. Moscow
Shcherba L.V. 1957. Baudouin de Courtenay and His Contributions to Linguistic
Studies. In Shcherba L.V. Selected Papers on the Russian Language, 85-96.
Moscow
Hockett Charles F.1961. Grammar for the hearers. In Structure of language and its
mathematical aspects. In Proceedings of symposia in applied mathematics, vol.
12, 220-236.
On the buildup of an integrated database for the
formal description of grammars for the hearers
Vadim Kasevich1, Iuliia Menshikova2, Maria Khokhlova2, Elena Shuvalova2,
Anna Lastochkina3
1
Faculty of Asian and African Studies, SpbU, Russia
2
Faculty of Philology, SpbU, Russia
3
Faculty of Liberal Arts and Sciences, SpbU, Russia
Abstract
Grammars for the hearers often significantly differ from those for the readers as
traditional orthographic notation of wordforms is unable to fully represent the actual
expression of the morphological categories and, consequently, the real composition
of the paradigms. As a first step for the construction of a grammar for the hearers,
one needs a database containing the information on the spoken (phonological)
expression of the morphological units. At present the part of the database with the
information on Russian nouns is completed. The subjects in the database are Russian
noun forms of different declensions and accent paradigms expressing all the types of
the stem endings that are able to shape the actual spoken realization of a form.
Key words: linguistics, Russian language, grammar, phonetics, morphemics.
Introduction
The idea of the project is based on two articles published in the 1970s: L.V.
Bondarko, L.A. Verbitskaya “On Phonetic Characteristics of Post-tonic
Vowels in the Modern Russian Language” and L.V. Bondarko, L.A.
Verbitskaya, M.V. Gordina, L.R. Zinder, V.B. Kasevich “Styles of
Pronunciation and Types of Pronouncing”. The experiments on which these
publications were based showed, in particular, that native speakers do not
distinguish “by ear” such word forms as, for example, новая, новое: they
merge into новая. And it is not a singularity, because such “merges” are
found in many different segments of the system of the modern Russian
language.
Baudouin de Courtenay was first to call the problem of describing the
grammar of a language on the basis of oral (primary) speech one of central
fundamental problems of descriptive grammar in particular and of theoretical
linguistics in general. However, more than a century after the publication of
Baudouin’s works this problem remains unsolved. It explains the academic
novelty of this project. For a long time solving this problem was considered
problematic, because it required having developed and application-proven
phonological and grammatical theories. Present-day linguistics in Russia has
all the prerequisites for a systematic description of the grammatical structure
of the contemporary Russian language on the basis of its oral form, and the
problem of creating this description is of great current interest.
The authors are not aware of any Russian or foreign research teams that
would work on the problems raised in this paper. At the beginning of the
XXth century there existed an international scholarly journal LE MAITRE
PHONETIQUE, where all publications were printed in phonetic
transcription. However, it was a purely empirical project the aim of which
was to popularize the usage of transcription.
Methodology
The specific problems that are to be solved within the project are the
development of two basic problematic areas. The first one is the creation of
databases that would reflect changes in inflectional paradigms of Russian
words that depend on their sound/orthographic codes. The second one is to
reveal shifts in the system of Russian morphosyntax caused by this recoding.
Using the projected databases will allow effectively establishing basic
trajectories of changes in paradigms after the change of the code (modality)
of the plane of expression of linguistic units. In order to solve the formulated
problem we use methods of classical structural linguistics with its focus on
revealing formal paradigms that consist if oppositive word forms; categorical
analysis; neutralization of oppositions in specific contexts etc. The formal
paradigms that are analyzed are seen as semantisized structures, where the
plane of expression and the plane of content are inseparable, and shifts in
semantics normally correlate with shifts in the content plane, and vice versa.
Considerable attention is given to the exploration (both theoretical and
experimental) of the category of neutralization in its complex relationship
with the category of homonymy.
The expected general outcome of the methods and approaches briefly
described above is a model that would allow tracing all the changes of the
language system that it undergoes in the transition from orthographically
oriented to phonologically oriented representation.
The first stage of the project is data collection and presentation of data in
the frame of the existing database. It will be build “around” separate
inflected parts of speech (nouns, adjectives, numerals, pronouns and verbs).
At the same time, we are going to use the results of database processing to
prepare material for perceptive experiments.
The results of the project are to be on open access, so choosing the data
format was an important decision. We have selected the XML format as the
most universal and well adapted to future conversion for the developing
database. Below is an example of a fragment of XML representation of the
lexical item «окно».
In the database there is search with wildcards support (of the language of
regular expressions), so it is possible to search for parts of words or
expressions. At present we are working on creating algorithms of data
processing for the database of the selected type on the basis of a completely
filled fragment of the nouns database. The objects of the database are the
word forms that represent Russian nouns of different types of declensions
and accent paradigms and demonstrate all the types of stem endings that can
influence the phonetic image of the word form. The fields of the database
contain information about the orthographic and phonetic image of a word
form, about all of its morphological characteristics, variability of
morphological forms and accent patterns, inflection indexes and accent
types. Different fields contain the orthographic and phonetic images of stems
and inflectional affixes included in each word form.
References
Bondarko L.V., Verbitskaya L.A. 1973. On Phonetic Characteristics of Post-Tonic
Inflexions in the Contemporary Russian Language. In Problems of Linguistics,
No 1, 37-49.
Bondarko L.V., Verbitskaya L.A., Gordina M.V., Zinder L.R., Kasevich V.B. 1974.
Styles of Pronunciation and Types of Pronouncing. In Problems of Linguistics,
No 2, 64-70.
How to write an oral dialect or about some
problems of the Tsakonian Corpus
Maxim Kisilier
Hellenic Institute, Saint-Petersburg State University; Department of Comparative
and Areal Linguistics, Institute for Linguistic Studies (RAS), Russia
Abstract
Hellenic Institute of the Saint-Petersburg State University in collaboration with the
Institute for Linguistic Studies of the Russian Academy of Sciences organized more
than twenty expeditions to South Kynouria in Peloponnese (Greece) in order to
describe the Tsakonian dialect. During these expeditions its participants collected a
large number of oral texts in Tsakonian and it was decided to create a Tsakonian
corpus so that this very interesting linguistic material could be easily accessed. This
paper provides the first description of the project and discusses its current problems.
Key words: Modern Greek dialectology, Tsakonian, language corpus.
Introductory remarks
Modern Greek dialectology has a rather long history. Many institutions in-
or outside Greece possess large collections of Modern Greek dialect
materials from various Greek speaking regions. Unfortunately the major part
of them remains unknown and unused not only by typologists, but even by
specialists in Modern Greek dialectology. Short dialect texts from these
collections are sometimes published as supplements to linguistic papers (cf.:
Kisilier 2009: 406–411; 2014: 342–344), but they can hardly be used for
serious linguistic analysis as they provide just a general idea of the dialect
and may lack some very important features. More often certain samples from
these collections appear in linguistic articles to illustrate a statement of the
author.
However when the statement is false, the reader may be led to incorrect
interpretations of the example or even to erroneous conclusions in general
since he has no opportunity to check this example or statement. Thus
Russian linguist Mikhail Sergievskiy who was the first to describe the verb
system of Azov Greek found perfect forms in this dialect (Sergievskiy 1934:
582–583). So Azov Greek could be grouped together with other few Modern
Greek dialects that have perfect/pluperfect along with aorist. All other
descriptions of Azov Greek never mention perfect forms, while the analysis
of the modern state of the verb in the dialect based on recently collected data
doesn’t let to discover any trace of perfect forms or any appropriate place for
them within the verb system (Kisilier 2009: 193–205). This ambiguous
situation can be easily explained. Sergiyevsky found perfect forms in the
poems by Georgy Kostoprav who tried to create a special language for Azov
Greek literature based both on local idioms and on some Demotic features
that in fact did/do not exist in the dialect like perfect forms (Kisilier 2009:
13–14).
The progress of modern technologies gives hope that one day there will
be no need to look for dialect examples in books and articles, but in text
corpora. Nowadays there is still no open access corpus of any Modern Greek
dialect that can be really helpful for linguistic research (cf.:
http://griko.project.uoi.gr/), but many attempts in this direction are already
made. In this paper I am going describe briefly the project of Tsakonian
corpus and some problems I had to face.
Acknowledgements
This research was supported by the Russian Science Foundation (project No 15-18-
00062) and the Russian Foundation for Humanities (project No 14-04-00581).
References
Salminen, T. 2007. Europe and North Asia (Ch. 3) In Moseley, C. (ed.) 2007.
Encyclopedia of the world's endangered languages, 211–280. London; New
York, Routledge.
Deffner, M. 1923. Lexicon tis Tsakonikis dialektou. Athens, Typography “Estia”;
Meissner & N. Karagadouris.
Kontosopoulos, N. G. 2001. Dialektoi kai idiomata tis neas ellinikis. Athens, Grigori
Publishers.
Ηaralampopoulos, A. L. 1980. Fonologiki analysi tis Tsakonikis dialektou. Doctoral
dissertation. Arostotle University of Thessaloniki, Faculty of Philosophy.
Thessaloniki
Domosiletskaya, M. V., Zhugra, A. V., Klepikova, G. P. 1997. Malyy
dialektologicheskiy atlas balkanskikh yazykov. Lexical questionnaire. St.
Petersburg, Institute for Linguistic Studies.
Kisilier, M. L. 2014. Tsakonskiy dialekt: novyy vzglyad In Vydrin, V. F.,
Kuznetsova, N. V. (eds.) 2014. Ot Bikina do Bambalyumy, iz varyag v greki.
Ekspeditsionnye etyudy v chest’ Eleny Vsevolodovny Perekhval’skoy, 330–
348. St. Petersburg, Nestor-Istoria.
Kisilier, M. L. (ed.) 2009. Lingvisticheskaya i etnokul’turnaya situatsiya v
grecheskikh selakh Priazov’ya. Po materialam ekspeditsiy 2001–2004 g. St.
Petersburg, Aleteya.
Kisilier, M.. L., Fedchenko, V. V. 2011. K voprosu o myagkikh soglasnykh v
tsakonskom dialekte novogrecheskogo yazyka. Indo-European Linguistics and
Classical Philology Yearbook XV, 259–266.
Sergievskiy, M. V. 1934. Mariupol'skie grecheskie govory. Opyt kratkoy
kharakteristiki. Izvestiya AN SSSR. Otdelenie obshchestvennykh nauk 7, 533–
587.
Some aspects of /r/ articulation in French Vocal
Speech
Ulyana Kochetkova
Department of Philology, Saint-Petersburg State University, Russia
Abstract
This study analyses some common and individual strategies in choosing /r/-variants
in French vocal speech. The problem of the /r/ pronunciation is approached from a
new side by considering deviations from singers’ main /r/ articulation model. The
following analysis has been done: examination of the individual and common
preferences of 2 different generations of singers in /r/ articulation in French lyric
songs and operatic arias; study of deviations frequency in different phonetic contexts
in its relation to musical phrase boundaries.
Key words: singing, French, phonetic-phonological analysis, pronunciation models.
Introduction
The question of the French /r/ articulation is today one of the most discussed
subjects both due to its variety in the contemporary standard French and to
the existing differences of views on pronouncing this consonant on stage in
Opera as well as in Art Songs (Melodies). There is no absolute agreement
among singers, singing teachers, accompanists and coaches about which
variant is preferable: the “Italianate” apical alveolar trill or flap suggested
from 17th century onward as the only correct pronunciation in singing
(Bacilly 1679, Garcia 1851, Duval 1878, Lavoix, Lemaire 1881, Grubb
1979, Yarbrough 1991); or the conversational uvular consonant, the latter
having been criticized for its “vulgarity” and destroying effect that it
produces on surrounding vowels and airflow projection in general (Nedecky
2015) or recommended only to French native singers (Vennard 1967).
However, the uvular consonant is consistently observed not only in some
famous modern French singers’ performances (Nedecky 2015),but can also
be found (though seldom) even in the interpretations by renowned artists of
the past, who themselves crucially criticized it.
Today most of non-French contemporary singers face certain problems
and difficulties when performing an opera or lyric song written by a French
composer, for it is one of the most complicated languages for a non-native
speaker to sing. The modern performing art standards are high, but there is a
lack of panoramic theoretical and experimental works in this field, so that
the current study will be a contribution to it.
Results
In the studied material 15 different types of phonetic contexts were defined,
presenting 5 main groups: 1) intervocalic – VRV (“horizons"); 2) musical
phrase initial position – RV ("reviens"), RCV ("roi"), CRV ("cri"), CRCV
Some aspects of /r/articulation in French Vocal Speech 89
Conclusion
This study allowed to make the following observations: 1)different /r/-
articulation preferences in singing exist in both singers’ age groups; 2)
deviations from two different models are possible in both groups; 3) some
contemporary performers never choose the model with the uvular consonant
(even in lyric songs); 4) some of the contemporary singers use different /r/-
articulation main models in different styles (Romantic vs. Baroque); 5) some
phonetic contexts, as well as the initial or final position in a musical phrase
may influence the occurrence of deviations from the chosen /r/-articulation
model.
References
Bacilly, B (de). 1679. L’art de bien chanter de M.de Bacilly. Paris, Bacilly.
Garcia, M. 1851.Ecole de Garcia: traité complet de l’art du chant. Paris, Mayence.
Grubb, Th. 1979. Singing in French: a Manual of French Diction and French Vocal
Repertoire. Belmont, Schirmer Books.
Duval, G. 1878. Artistes etcabotins. Paris, Ollendorf.
Lavoix, H., Lemaire, Th. 1881. Le chant. Sesprincipeset son histoire. Paris,
Heugeletfils.
Nedecky, J. 2015. French Diction for Singers. A Handbook of Pronunciation for
French Opera and Melodie. Toronto, Book POD.
Vennard, W. 1967. Singing. The Mecanism and the Technic. New York, Karl
Fischer.
Yarbrough, J. 1991. Modern Languages for Musicians. Stuyvesan, Pendragon Press.
Different acoustic cues for emphasis in teaching
English word stress to Hong Kong Cantonese
ESL learners of different proficiencies
Wience Wing Sze Lai1,2, Manwa Lawrence Ng2
1
Hong Kong Community College, The Hong Kong Polytechnic University, Hong
Kong
2
Speech Science Laboratory, Division of Speech and Hearing Sciences, The
University of Hong Kong, Hong Kong
Abstract
The present study examined English word stress produced by twenty-two (11 highly
proficient and 11 less proficient) native adult speakers of Hong Kong Cantonese
(CS) learning English as a second language (ESL), in comparison with that produced
by five native English speakers (NS). All participants read four English donor
words, and CS also read the corresponding Cantonese loanwords. The three acoustic
cues for stress, namely pitch (F0), duration (length) and intensity (loudness) values
of the vowels were obtained from all syllables. While vowel duration was found to
be the dominant cue, followed by F0, in distinguishing stressed and unstressed
syllables in all speakers’ production, HCS may have overused F0 and LCS may have
underused vowel duration.
Key words: English word stress, Cantonese loanwords, acoustic cues, speaker
proficiency
Introduction
To Cantonese speakers (CS) who have been using English as a second
language (ESL), English word stress could be a challenge, because
Cantonese, as a tone language, makes use of pitch to distinguish lexical
meanings while English, as a stress language, makes use of not only pitch
(fundamental frequency, F0) but also intensity (loudness) and duration
(length). With regard to Cantonese speakers’ English word stress acquisition,
previous studies investigated either (1) Cantonese loanwords borrowed from
English (Lai, 2004; Lai, Wang, Yan, Chan, & Zhang, 2011; Silverman,
1992; and Zhang, 1986) or (2) CS’s pronunciation of English words (Chan,
2007; Lai & Ng, 2014a; 2014b; and Luke, 2000).
All studies in (1) agreed that loanword syllables corresponding to
stressed ones in English were assigned a high level (55) tone. Epenthetic
loanword syllables were assigned a low-mid (22) tone (Lai, 2004; Zhang,
1986), but loanword syllables corresponding to unstressed ones assigned a
mid (33) (Zhang, 1986) or low-mid (22) tone (Lai, 2004; Lai, et al., 2011).
With regard to (2), while Chan (2007) found that CS could effectively
represent word stress by manipulating duration, intensity and F0, Lai and Ng
(2014a; 2014b) identified F0, rather than duration and intensity, as the
dominant cue for producing stress in HCS and LCS. Luke (2000) reported
stressed syllables as being assigned an H tone and unstressed ones an M or L
tone.
As revised from Lai and Ng (2014a), which compared only HCS and
LCS (excluding NS) and measured parameters by segmenting syllables
instead of vowels, this study examines CS’s production of English word
stress in English donor words and corresponding Cantonese loanwords by
identifying the most dominant acoustic cue, among pitch, intensity and
duration of the vowels, for HCS and LCS, when compared with NS.
Methodology
Twenty-two Cantonese ESL speakers (F=11; M=11), aged 18-24, were
recruited as target participants, known as CS. All CS were born in Hong
Kong and had lived there since birth. Among them, 11 were highly
proficient in English (with a grade “C” in HKALE UE or a grade “5” in
HKDSE English, equivalent to an IELTS score of 6.51, or above), and 11
were less proficient (with a grade “E” in HKALE UE or a grade “3” in
HKDSE English, equivalent to an IELTS score of 6.02, or below) (Hong
Kong Examination Authority, 2004; 2010). All CS were recruited from the
Hong Kong Community College (HKCC), The Hong Kong Polytechnic
University (PolyU) community. Five native speakers (F=2; M=3) of British
English were recruited as controls, known as NS. They were all residents of
the United Kingdom. All participants had normal hearing, speech and
language ability by self-report.
All participants were instructed to read four English donor words (sauna
/ˈsɔːnə/, guitar /ɡɪˈtɑ:/, carnivals /ˈkɑ:nɪvəlz/ and vanilla /vəˈnɪlə/), and CS also
the corresponding Cantonese loanwords (桑拿 /sɔŋ55 na:21/, 結他 /kit33
tha:55/, 嘉年華 /ka:55 nin21 wa:21/ and 呍哩拿 /wɐn22 nei55 la:35/). The
speech samples were recorded using AUDACITY in a quiet room with a
high-quality unidirectional dynamic microphone fixed at 10 cm from each
participant’s mouth for consistency.
The recording of each participant was first processed using Praat
(Boersma & Weenink, 2010). Each syllable in the pronounced English donor
words and Cantonese loanwords was extracted and stored. The extracted
syllables of both the English donor words and Cantonese loanwords were
then classified into two types, (1) stressed syllables or those corresponding
to stressed syllables in the English donor words, and (2) unstressed syllables
or those corresponding to unstressed syllables in the English donor words.
Teaching English word stress to CS of different proficiencies 93
The vowels were segmented manually by one of the authors, with ten
percent repeated for intra-judge reliability measure, regarded as satisfactory
with the Spearman’s correlation coefficient between the duration of
segmented vowels as 0.997 (p < 0.001). Three acoustic parameters: average
fundamental frequency (F0) (in Hz), duration (in ms), and average intensity
(in dB) of the vowel were measured from each sound sample.
Results
Concerning the production of the English donor words, vowel duration
(instead of F0 in CS as identified previously) was found to be the dominant
cue in distinguishing stressed and unstressed syllables in both NS and CS.
However, HCS (with a difference of 32% between stressed and unstressed
English syllables) appeared to be more similar to NS (with a difference of
51%) in relying on vowel duration when compared with LCS (with a
difference of only 15%). While F0 was the next dominant cue for both NS
and CS, HCS (with a difference of 20%) relied on F0 more than both NS and
LCS (with a difference of 13% and 10% respectively) did.
Since Cantonese makes use of tones but not stress to contrast meanings,
Cantonese loanword syllables corresponding to stressed and unstressed
English syllables are supposed to differ only in F0 but not in intensity and
vowel duration. Surprisingly, vowel duration was still the dominant cue,
followed by F0 and intensity, in both HCS and LCS’s production. Despite
this, the small difference of only 2% in HCS in the use of F0 in
distinguishing the (originally) stressed and unstressed syllables in the
English donor words and Cantonese loanwords and the marked difference of
28% in LCS in the use of vowel duration in distinguishing them further
confirm HCS’s overuse of F0 and LCS’s underuse of vowel duration in
realising English word stress.
Conclusion
In short, unlike previous findings, vowel duration was found to be the
dominant cue, followed by F0, in distinguishing stressed and unstressed
syllables in all speakers’ production. Also, HCS may have overused F0 and
LCS may have underused vowel duration. This implies the need for different
approaches in teaching English words stress, with less emphasis on F0 for
HCS, and more emphasis on vowel duration for LCS.
Acknowledgements
The work described in this paper was substantially supported by a grant from the
College of Professional and Continuing Education, an affiliate of The Hong Kong
94 W.W.S Lai, M.L. Ng
References
Boersma, P., & Weenink, D. 2010. Praat: doing phonetics by computer. Retrieved
July 20, 2011, from http://www.fon.hum.uva.nl/praat/
Chan, M. K. K. 2007. The Perception and Production of Lexical Stress by Cantonese
Speakers of English. M.Phil Dissertation. Hong Kong: The University of Hong
Kong.
Lai, W. 2004. Tone-stress Interaction: A study of English Loanwords in Cantonese.
M.Phil Dissertation, The Chinese University of Hong Kong, Hong Kong.
Lai, W. W. S., & Ng, M. L. 2014a. English Donor Words and Equivalent Cantonese
Loanwords Pronounced by Hong Kong Cantonese ESL Learners - Implications
for Teaching English Word Stress. Proceedings of International Teacher
Education Conference, Dubai (pp.19-28). Dubai: Ankara University.
Lai, W. W. S., & Ng, M. L. 2014b. The Use of Acoustics-based Teaching Software
in Hong Kong Cantonese ESL Speakers’ Learning of English Word Stress
Production. Proceedings of the 6th Annual International Conference on
Education and New Learning Technologies, Barcelona (pp. 5773-5781).
Barcelona: IATED.
Lai, W. W., Wang, D., Yan, N., Chan, V., & Zhang, L. 2011. Influence of English
Donor Word Stress on Tonal Assignment in Cantonese Loanwords - An
Acoustic Account. In W. Lee & E. Zee (Eds.), Proceedings of the 17th
International Congress of Phonetic Sciences (pp. 1162-1165). Hong Kong: City
University of Hong Kong.
Luke, K. K. 2000. Phonological Re-interpretation: The Assignment of Cantonese
Tones to English Words. ICCL-9 Conference Paper. Singapore: National
University of Singapore.
Hong Kong Examination Authority. 2004. IELTS (2004). Retrieved from
http://www.hkeaa.edu.hk/en/ir/Standards_of_HKEAA_qualifications/IELTS/
Hong Kong Examination Authority. 2010. Results of the Benchmarking Study
between IELTS and HKDSE English Language Examination [Press Release].
Retrieved from
http://www.hkeaa.edu.hk/DocLibrary/MainNews/press_20130430_eng.pdf
Silverman, D. 1992. Multiple Scansion in Loanword Phonology: Evidence from
Cantonese. Phonology, 9, 289-328.
Zhang, R. 1986. Xianggang Guangzhouhua Yingyu yinyi jieci de shengdiao guilü [=
the tonal patterns of English loanwords in Hong Kong Cantonese]. Zhongguo
yuwen, 1, 42-50.
Cognitive approach to translation and
interpreting teaching methods
Julia Levi
Department of the English, MGIMO University, Russia
Abstract
Nowadays translation/interpreting studies are focused upon human mental processes,
cognition, the role of the interpreter/translator. According to the human activity
theory, each action is purpose - oriented, thus a complex act of
translation/interpreting which can be described as a secondary process of human
activity is goal - oriented as well. It means that the act of interpreting/ translation
corresponds to the main principles of human activity, has its own purpose and is
aimed at achieving the same result as an ordinary act of communication, i.e. a
communication effect. We believe, it is critical to start an account of the text for
translation purposes by making a deliberate pre-translation text analysis (PTA),
which according to most experts, may consist of several activities.
Key words: cognition, the act of interpreting/ translation, pre-translation text
analysis
Introduction
A new paradigm of language studies allowed linguists in the late ХХ – early
XXI centuries to consider the language as a dynamic phenomenon, rather
than a static product, so nowadays experts in translation/ interpreting
studies have become more interested in exploring the basic principles
of the process of translation/interpreting, which is characterized by the
shift to the study of human mental processes, cognition, the role of the
interpreter/translator. At the first stage of the development of
translation/interpreting science scholars focused on the analysis and
description of some objective laws and rules of transformations. But later a
new approach with the focus on the nature of the process of
translation/interpreting was put forward, which became possible due to
advancement in research in the fields of psycholinguistics, sociolinguistics,
cognitive linguistics, anthropology, and etc. The roots of a cognitive
approach can be traced back to the ideas of such renowned linguists as F. de
Saussure, L. Vigotskyi, L. Sherba, A. N. Leontiev, A. A. Leontiev, and many
others. In fact they developed and implemented the strategies of linguistic
studies which consider the language as a part of human activity with a
human playing the central role in it. According to the human activity theory,
each action is purpose - oriented, thus a complex act of
translation/interpreting which can be described as a secondary process of
human activity is goal - oriented as well. It means that the act of interpreting/
translation corresponds to the main principles of human activity, has its own
purpose and is aimed at achieving the same result as an ordinary act of
communication, i.e. a communication effect.
What allows translators/interpreters to achieve the same communication
effect, evoke the same feelings and emotions in the target recipient? We
believe that a profound comprehension of the original text, successful
meaning construction produces a communication effect envisaged by the
author of the original text.
Methodology
According to J. Field, central to meaning construction is the distinction
between 1) the words on the page or in the ear; 2) the propositional
information that a text contains (loosely, its literal meaning); and 3) the
enriched and selective interpretation which a reader or listener takes away.
In processing a text, a comprehender performs a number of operations. At a
sentence level they 1) extract propositional information; 2) make any
necessary inferences; 3) enrich the interpretation by applying word
knowledge; 3) integrate the new information into their mental representation
of the text so far; 4) monitor their comprehension in case of
misunderstanding.
At discourse level, they also have 1) to recognize the hierarchical
structure of the text; 2) identify patterns of logic which link the parts of the
text; 3) determine which parts of the text are important to the speaker/writer
or relevant to their own purposes.
Numerous accounts of discourse comprehension which attempt to
describe how text information is built into an overall meaning representation
have proved to be useful both for scholars and learners. A cognitive
approach to text studies help linguists perceive information processing
mechanisms better and therefore work out some strategies to secure a full
understanding of the text.
Nevertheless, comprehension is one of the stages that the model of
translation/interpreting comprises. In fact, the model consists of three stages:
comprehension, the act of translation/interpreting, and text production.
At the level of comprehension the translator/interpreter builds the
concept of the text. When they perceive the original text in a foreign
language, they search for semantic frame equivalents to their knowledge.
Charles Fillmore believes that, “meanings are relativized to scenes”.
According to him, meanings have an internal structure which is determined
relative to a background frame or a scene. What is more, during the text
processing a so-called process of anticipation plays an important role as it
helps to predict the final unfolding of the text through the explanation of
Cognitive approach to translation and interpreting teaching methods 97
Results
Most experts suggest that a pre-translation text analysis (PTA) may consist
of several activities: 1) considering factors external to the linguistic text; 2)
establishing the style and genre of the text; 3) designating the type of the
information represented in the text. The succession of these stages may vary,
but all the existing models of PTA illustrate 1) textocentric (linguistic); 2)
functional; 3) communicative approaches to this process.
On the basis of the U. Breus and N. Valeeva conceptions of PTA, we
present a full PTA, which ensures a better comprehension of the text and a
well-balanced approach to the selection of a translation strategy.
1. Identify the type of the text (narration, description, etc.) and its
functional style (scientific, publicist, official, colloquial, etc.).
2. Outline the basic communication goal of the author, his/her
intention, cultural/situational factors.
3. Specify the primary and secondary functions of the text (to inform,
communicate, exert influence), which can be understood through
explicit or implicit markers.
4. Outline the context.
5. Define the main topic of the text.
6. Specify the stylistic devices of the author.
98 J. Levi
Conclusion
A cognitive approach has proved to be efficient in translation/interpreting
teaching methodology as it explains the cognitive functions of the humans’
mind, provides a profound analysis of the translation/interpreting model,
through a well elaborated pre-translation analysis (PTA) helps learners apply
the right strategy of translation, and thus, master the art of
translation/interpreting.
References
Alekseeva, I.S. 2004. Vvedenie v perevodovedenie. – M.: Izd. centr «Akademiya».
Breus, E.V. 2007. Kurs perevoda s anglijskogo yazyka na russkij. Uchebnoe
posobie. – M.: Valent.
Field, J. 2004. Psycholionguistics The key concepts, Routedge Taylor & Francis
Group: London and New York.
Fillmore, C. 1977. The case for case reopened. In Syntax and Semantics 8:
Grammatical Relations, ed. P. Cole, 59 – 81. New York: Academic Press.
Nefedova L.A., Remhe I. N. Kognitivnye osobennosti perevodcheskogo processa. -
Chelyabinskij gosudarstvennyj universitet. - S. 64-72
Shvejcer A.D. Teoriya perevoda (status, problemy, aspekty).
Valeeva, N.G. 2010. Teoriya perevoda: kul'turno-kognitivnyj i kommunikativno-
funkcional'nyj aspekty: Monografiya. – M.: RUDN.
Zimnyaya, I.A. 2001. Lingvopsihologiya rechevoj deyatel'nosti. — M.: Moskovskij
psihologo-social'nyj institut, Voronezh: NPO «MODEHK». (Seriya «Psihologi
Otechestva»).
Perception of reduced words: Chunking and
predictability
David Lorenz1, David Tizón-Couto2
1
English Department, Albert-Ludwigs-Universität Freiburg, Germany
2
Facultade de Filoloxía e Tradución, Universidade de Vigo, Spain
Abstract
This is a first report on a word-monitoring experiment to examine how frequency-
based chunking and predictability affect recognition of reduced speech. The effect of
reduction on recognition of the word to was tested in English V to Vinf constructions
of varying frequencies (e.g. have to go, prefer to stay). Our first results suggest that
in types of mid-high frequency, predictability aids the recognition of a reduced item.
In very high frequency sequences, however, reduction seems to encourage chunking,
that is, accessing the sequence as a single unit.
Key words: chunking, reduction, frequency, speech perception
Introduction
It has long been noted that certain multi-word sequences undergo
phonological reduction and contraction to a single word (e.g. want to >
wanna). In usage-based approaches, this is seen as a matter of coalescence,
or chunking, which in turn has been linked to frequency (i.a. Bybee 2006,
Ellis et al. 2009). Thus high-frequency sequences will be stored in the mind
as a single unit. They have a propensity for reduction due to neuromotor
routines (Bybee 2006), but the reduced forms may be more or less strongly
represented in the language user’s mind, on a gradient cline from on-line
reduction in articulation to stored, fixed variants (Connine & Pinnow 2006,
Lorenz 2013).
Most of the evidence of chunking and the gradient status of reductions
regards language production only, which raises the question how they affect
speech perception. There is some evidence that full canonical forms
generally serve the listener best (Tucker 2011, Pitt et al. 2011). In a word
recognition experiment, Sosa & MacFarlane (2002) show that listeners treat
highly frequent sequences as chunks, leading to a delayed recognition of
elements of the sequence (e.g. of in kind of). Their design did not, however,
consider these sequences’ propensity for reduction (e.g. “kinda”) and its
effect on word recognition. In a similar study Kapatsinski & Radicke (2009)
find a U-shaped frequency effect, such that word recognition is delayed in
sequences of both very high and very low frequency. They suggest that
frequent co-occurrence increases the predictability of a word, hence
facilitates its recognition, and that this is offset by chunking and low salience
in collocations of very high frequency.
The present study builds up on this, testing the import of string
frequency and reduction on speech perception. It employs constructions of
the type V to Vinf (e.g. need to work, dare to go) to measure response times
to the word to.
The crucial question is how frequency and reduction interact. In high
frequency collocations, listeners may have an active knowledge of the high
probability of to based on frequency, leading to a higher expectation of
reduction (cf. Jurafsky et al. 2001); in this case reduction would not strongly
affect recognition times. On the other hand, listeners may have a chunked
item available; in that case a reduced form would lead them to access this
chunked variant and considerably delay recognition of to.
Experiment design
The stimuli consist of 126 recorded sentences in American English. 42 of
these contain a V to Vinf construction (the target items), 42 contain to in a
different construction (control items), 42 do not contain to at all (distractors).
Native speakers of American English were asked to respond to the presence
or absence of to as accurately and quickly as possible. Response times were
measured from the onset of to.
The V to Vinf sequences are of varying frequencies, as taken from the
Corpus of Contemporary American English (COCA, Davies 2008-) – e.g.
trying to Vinf (high frequency), deign to Vinf (low frequency). Participants
were assigned to one of two groups; each group heard half of the target items
with a full pronunciation, the other half with a reduced to (e.g. need to as
“needa”). This reduction and the frequency of the sequence serve as
independent variables whose effect on response times is tested.
At the time of writing, the study is still ongoing. We present here a
sketch of the results from 22 participants, which gives a first impression of
the interplay of frequency and reduction.
Results
Overall, participants correctly identified to within 2000 milliseconds in
89.7% of cases (1658/1848). When comparing conditions, however, the
accuracy rate is significantly lower for reduced items than for fully
articulated ones (82.7% vs 94.4%).
There is also a clear difference between full and reduced stimuli in the
response times of the correct responses. Recognition of reduced items is
significantly delayed compared to full items. The mean response times are:
Full to: 636 ms – Reduced to: 786 ms – Control: 683 ms
Perception of reduced words: Chunking and predictability 101
1250
mean response time (msec)
condition
750
full
reduced
500
250
1 2 3 4
frequency bin (verb form + 'to')
Figure 1. Response times to full and reduced to by frequency of V to Vinf type. The
p-values refer to Mann-Whitney U test of difference between ‘full’ and ‘reduced’ in
each frequency bin.
As Fig.1 shows, there is a clear difference between response times to full and
reduced items, except at mid-high frequencies (bin 3). Recognition of
reduced to is slowed down at low and very high frequencies. The pattern is
less clear for the fully pronounced items, where recognition appears to be
less sensitive to frequency.
Discussion
In low frequency collocations (bin 1), to is least predictable from context,
and reduction will be least expected; here its recognition is slowest in both
full and reduced forms.
Regarding the pattern for reduced items in Fig.1, our tentative
interpretation is that there is a frequency range (around or within bin 3) at
which to is highly predictable and reduction can be expected; therefore,
reduction does not inhibit recognition. At higher frequencies (bin 4), a
chunking effect sets in which inhibits recognition of the element and which
is reinforced by a reduced rendering. Possibly, this chunking also implies an
expectation of reduction, such that a reduced input leads the listener onto a
102 D. Lorenz, D. Tizón-Couto
References
Bybee, J. 2006. From usage to grammar: The mind’s response to repetition.
Language 82(4), 711-733.
Connine, C. and Pinnow, E. 2006. Phonological variation in spoken word
recognition: Episodes and abstractions. The Linguistic Review 23, 235-245.
Davies, Mark. 2008-. The Corpus of Contemporary American English: 450 million
words, 1990-present. Available online at http://corpus.byu.edu/coca/.
Ellis, N., Frey, E. and Jalkanen, I. 2009. The psycholinguistic reality of collocation
and semantic prosody (1): Lexical access. In Römer, U. and Schulze, R. (eds.)
2009, Exploring the Lexis-Grammar Interface, 89-114. Amsterdam, John
Benjamins.
Jurafsky, D., Bell, A., Gregory, M. and Raymond, W. 2001. Probabilistic relations
between words: Evidence from reduction in lexical production. In Bybee, J. and
Hopper, P. (eds.) 2001, Frequency and the Emergence of Linguistic Structure,
229–254. Amsterdam, John Benjamins.
Kapatsinski, V. and Radicke, J. 2009. Frequency and the emergence of prefabs:
Evidence from monitoring. Formulaic Language 2, 499–520.
Lorenz, D. 2013. Contractions of English Semi-Modals: The Emancipating Effect of
Frequency. NIHIN Studies. Freiburg, Rombach.
Pitt, M., Dilley, L. and Tat, M. 2011. Exploring the role of exposure frequency in
recognizing pronunciation variants. Journal of Phonetics 39, 304-311.
Sosa, A. and MacFarlane, J. 2002. Evidence for frequency-based constituents in the
mental lexicon: collocations involving the word of. Brain and Language 83,
227-236.
Tucker, B. 2011. The effect of reduction on the processing of flaps and /g/ in
isolated words. Journal of Phonetics 39, 312-318.
Neurological state manifestation in infants’ and
children’s voice features
Elena Lyakso, Olga Frolova
Child Speech Research Group, Saint Petersburg State University, Russia
Abstract
This study has the aim to find out the data about the reflection of the neurological
state in the voice features of infants and children. Two types of experiments were
conducted: comparing of vocalizations of 0-3 months old infants having
neurological disorders (n = 45) and typically developed (TD) infants (n = 50);
comparison of speech features of TD children (n=30) with vocalization and speech
features of 5-16 years old children with autism spectrum disorders (ASD) (n=30).
The results of the study showed that the infant’s vocalizations contain features
important for determination of the risks of development. Differences between
children with ASD and TD on the basis of higher values of pitch, pitch variability
and formant characteristics were revealed.
Key words: voice features, children, RAS, neurological state.
Introduction
The human voice contains the characteristics important for different states
and developmental risk determination. Since 50 years of the last century the
study of infants cry and pain vocalizations for purpose to diagnose
neurological conditions were beginning (e.g. Wasz-Hockert, et al., 1996;
Xie, et al., 1996). More recent studies have focused on the acoustic
properties of speech production in autism spectrum disorders (ASD).
Abnormal prosody has been identified as a core feature for ASD (Bonneh, et
al., 2011), however in respect of pitch values and pitch variation, the data are
contradictory (Nakai, et al., 2014). The goal of this study is to find out the
acoustic features specific for developmental risk and ASD children
vocalizations and speech.
Method
Data collection
Participants in the study were -3 months old infants with neurological
disorders (ICD -10, 91.8, 91.9) (n = 45) and typically developed (TD) infants
(n = 50), 5-14 years old TD children and children with ASD (F84.0; n=30).
ASD children have varying degrees of neurological disorder severity. They
were divided into two groups: presence of development reversals at the age
1.5 - 3.0 years (group-1- ASD -1) and developmental risk diagnosed at the
infant birth (group-2 – ASD -2).
Two types of experiments were conducted: comparing of vocalizations
of infants with neurological disorders and TD infants; and speech features of
TD children with vocalization and speech features of ASD children.
Different emotional states were used for comparing TD children and ASD
children that allowed finding the variable characteristics of the voice.
Data analysis
The recording of vocalizations and speech was executed. Perceptive analysis
of vocalizations and speech was made (200 adults). Spectrographic analysis
of speech was carried out in the Cool Edit (Syntrillium Soft. Corp. USA)
sound editor. The duration of vocalizations and pauses were measured. Pitch
values, spectral maximums, their amplitude, and spectrum types were
determined. Pitch values (F0), min and max pitch values, pitch range (F0
max - F0 min), formant frequencies and their amplitudes of vowels were
measured in speech. All procedures were approved by the Health and
Human Research Ethics Committee (HHS, IRB 00003875, St. Petersburg
State University).
Result
Infant’s vocalizations features
The “noise” spectrum frequently presents (p<0.01 –Mann- Whitney test) in
the vocalizations of infants with neurological disorders than in vocalizations
of TD infants (figure 1).
**
40 calm vocalizations of TD
infants and infants with
20 neurological risk. **
p<0.01 – Mann -
0 Whitney test.
cry calm vocalization
700 ***
Figure 2. Vowel's pitch
TD ASD-1 ASD-2
600
average value in
***
***
**
discomfort, neutral and
500
comfort state. **- p<0.01,
400 **
*** - p<0.001 Mann-
F0.Hz
**
100
0
discomfort neutral comfort
106 E. Lyasko, O. Frolova
The heaver child disease, the higher pitch values and third formant
values, the lower speech level was revealed. Spearman correlation (p<0.05)
was revealed between child’s group and pitch values, third formant values.
Acknowledgements
The work was supported by Russian Foundation for Basic Research (grants 15-06-
07852а, 16-06-00024а).
References
Bonneh, Y.S., Levanov, Y., Dean-Padro, O., Lossos, L., Adini, Y. 2011. Abnormal
speech spectrum and increased pitch variability in young autistic children. Front.
Hum. Neurosci., 4. doi: 10.3389/fnhum.2010.00237
Paul, R., Augustyn, A., Klin, A., Volkmar, F. 2005. Perception and production of
prosody by speakers with autism spectrum disorders. Journ. Autism Dev.
Disord. 35, 205–220.
Nakai, Y., Takashima, R., Takiguchi, T., Takada, S. 2014. Speech intonation in
children with autism spectrum disorder. Brain and Devel., vol. 36, 6, 516-522.
Wasz-Hockert, O., Lind, J., Vuorenkoski, V, Partanen T, Valanne E. 1968. The
infant cry, a spectrographic and auditory analysis. London: Heineman Medical
Books.
Xie, Q., Ward R.K , Laszlo, C.A. 1996. Automatic assessment of infants’ levels-of-
distress from the cry signals. IEEE Trans. on Speech and Audio Proc. vol. 4,
253 - 265.
Features of written texts of people with different
profiles of Lateral Brain Organization of
Functions (on the Basis of RusNeuroPsych
Corpus)
Tatiana Litvinova1, Ekaterina Ryzhkova2, Olga Litvinova3
1
Regional Centre for Russian Language, Voronezh State Pedagogical University,
Russia
2
Department of Russian Language, Voronezh State University of Engineering
Technologies, Russia
3
Department of English Language, Voronezh State Pedagogical University, Russia
Abstract
The aim of the study is detection of typological characteristics of written texts
created by people with different profiles of the lateral brain organization of functions
(LBOF). The material of the study is a special Russian text corpus RusNeuroPsych
containing metada about LBOF (motor, sensory, cognitive) of their authors.
Numerical values of a range of formal language parameters (index of lexical
diversity, frequencies of parts of speech, etc.) were extracted from 242 texts and
statistically significant (р0.05) correlations between numerical values of a range of
parameters of written texts and LBOF of their authors were identified for the first
time for Russian texts.
Key words: written text, Russian, neuropsychology, brain lateralization, text corpus.
Background
One of the most important neuropsychological characteristics reflecting
individual differences in the joint operation of the human brain hemispheres
(asymmetry) is the lateral brain organization of functions (LBOF,
Khomskaya et al. 1997). It is considered the foundation for the typology of
individual differences of the mental condition of healthy individuals as part
of a study in neuropsychology of individual differences. Neuropsychology of
individual difference is an application of neuropsychological concepts and
methods to the assessment of healthy subjects that tries to explain normal
functioning by using principal of cerebral organization particularly
characteristics of interhemispheric asymmetry and interaction (Glozman
2004, 838). The studies by Khomskaya et al. (1997) showed a stable
correlation between the types of LBOF and different aspects of cognitive,
motor and emotional activity of the normal subjects, which means that we
have a correct foundation for the norm typology.
Experimental study
Material
In order to address this problem it is necessary to create the corpus of written
texts containing information about the type of LBOF of their authors. The
text corpus RusNeuroPsych created under the guidance of the authors
currently contains 643 Russian-language written texts by 447 authors (native
Russian speakers) from 12 to 35 years of age. RusNeuroPsych corpus
contains metadata in the form of information about their authors: year of
birth, gender, native language, education, the results of psychological testing
and survey for identifying their motor, sensory and cognitive lateral profile
using the most indicative and simple tests (see Sirotyuk 2003, Semago 2005,
Balonov 1985). The index of the lateral brain organization (motor, cognitive,
sensory as well as individually for hands, legs, eyes, ears) was calculated as
the difference between the number of “right”, “left” and “symmetrical”
answers divided into the number of tests. An integral index of LBOF was
also computed as the difference between the number of “right”, “left” and
“symmetrical” answers divided into the number of tests.
For the present study 242 texts by 121 respondents (each respondent
wrote two texts – letter to a friend and description of a picture) aged from 24
to 35, 17 men, 104 women, were selected. The average length of text is 165
words.
Features of written texts of people with different LBOF 109
Methods
The texts were marked with the help of a morphological analyzer
polymorpy2 and online service istio.com and the numerical values of the
formal-grammar parameters of texts were obtained (indices of lexical
diversity of texts, frequencies of different parts of speech and their ratios and
other frequent parameters that occur in texts regardless of their topic and
genre, 22 in total). SPSS Statistics software was used to calculate the
Pearson coefficient between the text parameters and indices of LBOF. Two
series of experiments were conducted: in the first one both texts by the same
author were considered as one (“a sum corpus”) and in the second one two
texts were considered individually (“an individual corpus”).
Results
Significant correlations (р 0.05) between the formal-grammar parameters
of written texts and the type of LBOF of their authors which were observed
in two series of the experiments were revealed. The largest number of
correlations of the parameters (r = 0.27-0.41) of texts were found with
LBOFmotor (8), LBOFhands (8), LBOFintegral (7). There were much fewer
significant correlations found with the indices of sensory and cognitive
asymmetry except LBOFeyes (5). A positive correlation of the indices of
LBOFhands, LBOFmotor and LBOFintegral with the index of lexical diversity TTR
was identified and a negative one with a proportion of function words +
pronouns; proportion of function words; proportion of cognitive words;
proportion of full stops; proportion of 100 most frequent Russian words, i.e.
the more right properties there are in the human LBOF, the higher is the
lexical diversity of their texts and the fewer function words, pronouns, full
stops, most frequent words they have.
Acknowledgements
The study is financially supported by the grant of RFBR “Linguistic Parameters of a
Written Text and Neuropsychological Characteristics of its Author: A Corpus
Study”, project number 16-36-00036.
References
Balonov L. Ya., Deglin, V. L. and Chernigovskaya Т. V. 1985. Functional Brain
Asymmetry in Speech Organization. In Sensory Systems. Sensory Processes in
Hemisphere Asymmetry. Leningrad, Science.
Glozman, J. 2004. Russian neuropsychology after Luria. In Craighead, W.,
Nemeroff Ch. (eds.). The Concise Corsini Encyclopedia of Psychology and
Behavioral Science. NY, Wiley & Sons.
Gudkova, Т. V. 2010. Features of Functional Sensomotor Asymmetry in Preschool
Children with a General Speech Disorder. PhD thesis. Saint Petersburg, Herzen
State Pedagogical University of Russia.
IBM SPSS Statistics 22 Documentation. http://www-
01.ibm.com/support/docview.wss?uid=swg27038407#ru
Juola, P., Neocker, Jr. J., Stolerman, A., Ryan, M., Brennan, P. and Greenstadt, R.
2013. Keyboard Behavior Based Authentication for Security. IEEE IT
Professional, 15, 4, 8-11, July-Aug.
Khomskaya, Ye.D., Yefimova, I.V., Budyka, Ye.V. and Yenikolopova, Ye.V. 1997.
Neuropsychology of Individual Differences (Left-Right Brain and Mental
Condition). Moscow, Russian Pedagogical Agency.
Litvinova, G. V. 2013. Effect of Lateral Organization on the Formation of Speech in
Children. Petropavolvsk-Kamchatskiy, Vitus Bering Kamchatka State
University.
Semago, N. Ya. and Semago, М. 2005. Theory and Practice of the Evaluation of
Mental Development of a Child. Preschool and Junior School Age. Saint
Petersburg, Rech.
Shubin, А.V. and Serpionova, Ye.I. 2007. Brain Asymmetry and Features of Verbal
Creativity. Voprosy psikhologii 4, 89-97.
Sirotyuk, А.L. 2003. Neuropsychological and Psychophysiological Learning
Component. Moscow, Sfera.
Semantic differential as a method in empirical
investigation of Self-Image as father
Robert Manerov1, Kristina Manerova2
1
Department of Psychology and Pedagogy, University of Emercom, Russia
2
Department of German Linguistics, Saint Petersburg University, Russia
Abstract
In our current study psychosemantics principles are used in the development of own
method “Father Image", based on the method of semantic differential of Ch.Osgood.
Following images are conceptual constructs in the study: Father Image, Self-Image
as a father, Real Self-image as a father, Constructive Self-image as a father. The
semantic differential provides a measure of 47 signs of the “Father Image”,
expressed by bipolar seven-point scale. The 47-sign scales are named with antonym
pairs of Russian adjectives and contradictional propositions, which were composed
by the modified method of M. Kuhn and T. McPartland and then evaluated by
groups of single and married male probationers with and without kids
Key words: semantic differential, self-Image, father Image
Introduction
The study was based on the psychosemantic approach (cf. V. Petrenko, A.
Shmelev, Ch. Osgood, J. Kelly et al.). We used psychosemantic principles to
develop our own method called “’Father image’ semantic differential” and to
interpret the findings. Currently, the main goals of the psychosemantic
approach include building and reconstruction of the individual value system
through which the subject perceives the world, other people and himself. Our
own “’Father image’ semantic differential” method was based on Osgood’s
semantic differential, which is a part of experimental psychosemantics.
Table 2. Factors of the real and constructive Self-Image as a father in men sith
different paternal and marital status.
Groups of men
SELF-IMAGE # Married men without Unmarried men
Married fathers
AS FATHER children without children
Factors
Morality Syncretism Syncretism
1
(factor power – 0,178) (factor power – 0,180) (factor power – 0,181)
Caring and trustworthy The object of pride
The object of love
2 teacher and love
(factor power – 0,139)
(factor power – 0,145) (factor power – 0,158)
R
E Social Activity The object of love Strong Personality
3
A (factor power – 0,097) (factor power – 0,124) (factor power – 0,137)
L Caring and trustworthy
Social Activity Social Activity
4 teacher
(factor power – 0,089) (factor power – 0,086)
(factor power – 0,071)
Kindness Mentor Democratic
5
(factor power – 0,057) (factor power – 0,065) (factor power – 0,055)
Conclusions
A comparative analysis of the six factor structures of real and constructive
Self-Image as a father in three groups of men led us to the following
conclusions.
In both groups of men without children the constructive image is more
geared towards the child and the family than the real one. The constructive
image, unlike the real image, includes the “Supports the family”
characteristic, which is not pronounced in the real fathers’ group. It is likely
that for most men who have not become fathers yet the issue of providing for
the family becomes the central one in whether to have a child or not.
The real Self-Image as a father in married fathers is much more realistic
and moderate than in the two groups of men without children, while the
constructive (in many ways, ideal) father for them is a tentative model,
distant from the real requirements and only partially realized in practice, as
evidenced by the fathers’ real experience.
The application of linguistically determined semantic differential method
with subsequent factorization and quantitative assessment is justified in
psychological research.
References
Manerov, R.V. 2013. The Self-Image as a father in the Men-Self-Concept. Thesis of
Kandidate of Psychology Science, Herzen State Pedagogical University of
Russia, URL: http://elibrary.ru/item.asp?id=22372944 (in Russian).
Manerov, R.V., Posokhova, S.T. 2012. The Factor analysis structure of the Self-
Image as a father by men with different paternal and marital status. In: A young
scientist in the modern science world: new aspects of the scientific search. –
L&L Publishing, 189-197 (in Russian).
Manerov, R.V., Posokhova, S.T., Lippo, S.V. 2008. The father image and the
personal self-actualization In: Herald of St. Petersburg University. Psychology,
Sociology, Pedagogy, Nr. 3, 23-30 (in Russian).
Manerov, V. Kh. 2012. The experience of semantic differential approach in the
research of the audio perception of verbal message. In: Traditions and
innovations in Psychology in Russia. Proceedings of the International
conference, dedicated to 215th Anniversary of the Herzen State Pedagogical
University of Russia, 450-455 (in Russian).
Automatic assignment of labels in Topic
Modelling for Russian Corpora
Aliya Mirzagitova, Olga Mitrofanova
Department of Mathematical Linguistics, St. Petersburg State University, Russia
Abstract
The main goal of this paper was to improve topic modelling algorithms by
introducing automatic topic labelling, a procedure which chooses a label for a cluster
of words in a topic. Topic modelling is a widely used statistical technique which
allows to reveal internal conceptual organization of text corpora. We have chosen an
unsupervised graph-based method and elaborated it with regard to Russian. The
proposed algorithm consists of two stages: candidate generation by means of
PageRank and morphological filters, and candidate ranking. Our topic labelling
experiments on a corpus of encyclopedic texts on linguistics has shown the
advantages of labelled topic models for NLP applications.
Key words: topic modelling, topic labelling, Russian corpora.
Introduction
In recent years, topic modelling has become one of the most fruitful
statistical NLP procedures which allows to reveal internal conceptual
organization of text corpora. A topic model is constituted by a family of
probability distributions over a set of topics extracted from a corpus, a set of
words occurring in a corpus and a set of texts forming a corpus. Various
algorithms of topic modelling (LSA, pLSA, LDA etc.) have been
successfully applied to English corpora (Daud et al. 2010) in research
dealing with information retrieval, content analysis, WSD, machine
translation, etc. However, Russian corpora are seldom involved in topic
modelling procedures. Certain positive results have been described in
(Mitrofanova 2015). Our project tries to fill in this gap.
Resulting topics are commonly represented as the top n terms with the
highest probabilities, which often poses a great challenge in their proper and
accurate interpretation. Assignment of a topic labels, i.e. a single word or a
phrase able to describe the semantics of a given topic, significantly assists in
this task. In most of the works on topic modelling, topic labelling is
conducted manually, which is a tedious process prone to subjectivity.
There have been proposed numerous techniques of automatic topic
labelling for English texts, including those relying only on the content of a
given corpus (Mei et al. 2007), and those requiring external resources like
Wikipedia (Lau et al. 2011) or various ready-made ontologies. All of them
are two-stage methods varying in the means of generating and ranking
Methodology
Candidate Generation
In order to generate candidate labels, the first 10 topic words are used to
query a search engine. After that, the titles of the top 30 search results are
combined into a text, which is then tokenised and lemmatised. Subsequently,
an oriented text graph G = {V, E} is created, where V is a set of nodes
containing lemmas, E is a set of edges. Two nodes are connected if the
respective lemmas occur in the window of ±2 words. We experimented with
three approaches to the weighting of the graph.
I. All of the edges are equal to 1 (unweighted graph).
II. The edges are weighted according to the co-occurrence frequency for
corresponding lemmas calculated inside the given text.
III. The edges are weighted with PMI values computed using the Russian
Wikipedia as a referential corpus (228 million tokens).
Next, the PageRank value (Mihalcea 2004) is computed for each node.
The obtained text graph now takes the following form: more important
words have larger nodes with higher PageRank values, while more
semantically related bigrams have thicker edges with bigger weight.
Since Wikipedia does not have individual articles for most technical
terms, we cannot verify the validity of a candidate label by checking whether
it is a title, as it was proposed in the previous approaches. Therefore,
appropriate n-grams are filtered from the text graph according to the
following morphological patterns: Adj + N, N + N in genitive case, N + Prep
+ N, N + Conj + N, etc. The contact phrases are concatenated into a single
group and added as a supplementary candidate label.
Candidate Ranking
The second stage includes ranking of the extracted candidates. We examine
the next three possible ranking metrics for each phrase label.
A. Simply summing the scores of the constituent words.
B. Normalizing the sum of the scores with regard to the phrase length.
C. Multiplying the sum by the coefficient calculated as , where i is
the position of the topic word in the original query. Thus we use the
information about the probability of a constituent word belonging to the
topic.
Automatic assignments of labels in topic modelling for Russian 117
Experimental Evaluation
For experiments, we collected a corpus of Russian encyclopaedic texts on
linguistics containing of 1,900 documents with a total of 1,3 million tokens.
After pre-processing, that is lemmatising with an open-source tool
pymorphy2 and removing stop words, the size of the experimental corpus
reduced to 800,501 tokens.
We performed a series of experiments on topic modelling with LDA
algorithm implemented within a scikit-learn package for Python and
obtained 20 topics, i.e. non-structured clusters of semantically related words.
Finally, we automatically assigned a label to each topic.
Graph weighting
Label ranking Baseline
I II III
A 2.01 2.03 2.07
B 1.63 1.70 1.87 1.03
C 1.70 1.73 1.77
Discussion
In this study, we address the gap in topic modelling for Russian corpora and
present an algorithm for automatic assignment of topic labels adapted for
Russian. It is based on the method described in (Aletras, Stevenson 2014),
118 A. Mirzagitova, O. Mitrofanova
References
Aletras N., Stevenson M., Court R. 2014. Labelling Topics using Unsupervised
Graph-based Methods. In Proc. of the 52nd Annual Meeting of the Association
for Computational Linguistics, vol. 2, 631-636, Baltimore, USA.
Daud A., Li J., Zhou L., Muhammad F. 2010. Knowledge discovery through
directed probabilistic topic models: a survey. Frontiers of Computer Science in
China 4, 280–301.
Lau J., Grieser K., Newman D., Baldwin T. 2011. Automatic Labelling of Topic
Models. In Proc. of the 49th Annual Meeting of the Association for
Computational Linguistics: Human Language Technologies, vol. 1, 1536–1545,
Stroudsburg, USA.
Mei Q., Shen X., Zhai C. 2007. Automatic labeling of multinomial topic models. In
Proc. of the 13th Intern. Conference on Knowledge discovery and data mining,
490, New York, USA.
Mihalcea R. 2004. TextRank: Bringing Order into Texts. In Proc. of EMNLP 2004,
404-411, Barcelona, Spain.
Mitrofanova, O.A. 2015. Verojatnostnoje modelirovanije tematiki russkojazychnyh
korpusov tekstov s ispol’zovanijem kompjuternogo instrumenta GenSim.
[Probabilistic topic modeling of the Russian text corpora by means of GenSim
toolkit]. In Trudy mezhdunarodnoj konferencii «Korpusnaja lingvistika –
2015», St.-Petersburg, Russia.
The time course of sociolinguistic influences on
wordlikeness judgments
James Myers, Tsung-Ying Chen
Graduate Institute of Linguistics, National Chung Cheng University, Taiwan
Abstract
This study examined how and when sociolinguistic factors affect wordlikeness
judgments by near-native bilinguals of Mandarin, the prestige language of Taiwan,
and Southern Min (Taiwanese). Auditory syllables nonlexical in both languages
were recorded by two bilingual speakers, one with a S. Min accent and one with a
Mandarin accent. Accent and target language (judging the syllables as Mandarin-like
or as S. Min-like) were crossed across participant groups. Binary judgments
collected via the Worldlikeness Web app were analyzed in terms of target language,
accent, participant gender, Mandarin and S. Min neighbourhood density, and
reaction time. Response patterns were affected by all of these variables, including
reaction time, in ways consistent with the differing social status of the two
languages.
Key words: wordlikeness, neighbourhood density, bilingualism, gender, time course
Introduction
Mandarin is the prestige language in Taiwan, though many speakers are also
native speakers of Southern Min (Taiwanese), another Sinitic language, even
if, as adults, they may be more fluent in Mandarin. This social situation
raises psycholinguistic questions: how and when is the phonological
processing of near-native bilinguals affected by sociolinguistic variables like
language status, gender (given that women are expected to favour the
prestige norm; Labov, 2001) and accent (given that S. Min-accented
Mandarin is expected to be disfavoured; Chung, 2006)?
To find out, we conducted a wordlikeness judgment task in which
speakers rated the acceptability of nonlexical items as possible words in
Mandarin or in Southern Min. Since this task is sensitive to neighbourhood
density (the number of lexical items minimally different from a test item;
Bailey and Hahn, 2001), and the influence of neighbourhood density
increases over time (Stockall, Stringfellow, and Marantz 2004), we were also
interested to see how the social variables interacted with neighbourhood
density (including in the non-target language: Frisch and Brea-Spahn 2010),
as modulated by reaction time (since slower responses may be sensitive to
later processes).
The lexicons of Mandarin and S. Min share crucial similarities: most
morphemes are cognates across these languages, morphemes are virtually
Methods
We used an auditory wordlikeness judgment task.
Procedure. Depending on which of the four groups they were assigned to,
participants were asked to judge syllables that were or were not S. Min-
accented as being like Mandarin or like S. Min. The Web app Worldlikeness
(Chen and Myers forthcoming; http://lngproc-4083.nitrouspro.com:3000/)
was used to present the stimuli in a different random order for each
participant. Responses were made by pressing either the ‘L’ key (like the
target language) or the ‘S’ key (not like it). Trials ended if a response was
received, or else after 4,000 ms. Both responses and reaction times (RT)
from stimulus onset were recorded. Experimental parameters and results are
available for download from the Worldlikeness website.
Time course of sociolinguistic influences on wordlikeness judgements 121
Conclusions
Our bilingual wordlikeness judgment study confirms that gender, accent, and
the social status of languages all influence real-time phonological
processing. In particular, judgments for the more prestigious language were
more critical, were hurt by neighbours in the less prestigious language
(especially for women), and may have been processed more deeply (perhaps
an indirect effect of the participants’ lower fluency in the less prestigious
122 J. Myers, T.-Y. Chen
Figure 1. The effects of target language, reaction time, and neighbourhood density
(left: Mandarin, right: S. Min) on wordlikeness judgments.
Acknowledgements
This study was supported by Ministry of Science and Technology (Taiwan) grant
MOST 103-2410-H-194-119-MY3.
References
Bailey, T. M. and Hahn, U. 2001. Determinants of wordlikeness: Phonotactics or
lexical neighborhoods? Journal of Memory and Language 44, 568-591.
Chen, T.-Y. and Myers, J. Forthcoming. Worldlikeness: A Web-based tool for
typological psycholinguistic research. Proc. of the 40th Annual Penn Ling.
Conf., Philadelphia, USA.
Chung, K. S. 2006. Hypercorrection in Taiwan Mandarin. Journal of Asian Pacific
Communication 16, 197-214.
Frisch, S. A. and Brea‐Spahn, M. R. 2010. Metalinguistic judgments of
phonotactics by monolinguals and bilinguals. Laboratory Phonology 1,
345‐360.
Labov, W. 2001. Principles of Linguistic Change, vol. 2: Social Factors. Oxford,
Blackwell.
Stockall, L., Stringfellow, A., and Marantz, A. 2004. The precise time course of
lexical activation: MEG measurements of the effects of frequency, probability,
and density in lexical decision. Brain & Language 90, 88-94.
The function of olfactory experience in
reasoning: An empirical study
Katalin Nagy
Department of Languages, University of Jyväskylä, Finland
Abstract
This study reports the role of olfactory experience (i.e. smell of medication) in a
nine-year old girl’s reasoning in pair-work situation where the children were asked
to choose items useful on a desert island. The extract analysed here is part of the
larger data set of my dissertation, in which I investigate how sensory-motor
activities involved in reasoning. I video-recorded an experimental task, in which the
participants (N=27; age=9; Hungarian L1) have been asked to choose 7 items out of
14 to take those to an imaginary uninhabited island. The multimodal analysis shows
that children did not choose the vitamin pills due to its unpleasant smell. The
findings suggests that crossmodal experiences can be structural elements of
reasoning.
Key words: multimodal analysis, sensory-motor activities, children’s reasoning
Introduction
The distributed view of language has became a widely used term in applied
linguistics. Most often it used to refer to the bodily, ecologically, socially or
situationally distributed nature of language (Streeck, Goodwin & LeBaron
2011). During the last two decades, a great deal of research has been
conducted on the embodied, visible aspect of interaction. In the last five
years, kinetic behaviour, especially the use of gestures has been studied in a
variety of contexts, including children’s reasoning (e. g. Alibali et. al. 2011,
2014; Ehrlich et. al 2006). However, the function of sensory perception and
motor activity during reasoning has been under-researched so far. Current
investigations suggest that the cross-sensory experience of the world is
created on the basis of interrelation between different sensual perceptions
(Fulkerson 2014; Calvert & Thesen 2004; Ernst et al. 2007). Nevertheless,
we have little information about how olfactory experiences are connected to
body movements and verbal utterances when people interact. To fill this gap
in the research I explore how the experience of smell were integrated into
children’s reasoning about the possible need of vitamin pills in a desert
island.
Method
Data collection
Data collection took place in the hobby room of a Hungarian elementary
school in the period of 3 weeks, during the afternoon day-care service. The
multimodal data includes video-recordings of children completing a desert
island task. In this activity the students were asked to choose 7 objects out of
14 to take with themselves to an imaginary uninhabited island. The task was
completed in pairs were children were asked to make a shared choice.
Furthermore, participants were asked individually and in pairs to justify their
choices in an interview conducted by the researcher. In this paper I analyse a
unique extract of a pair-work where children smelled the vitamin pills while
they were reasoning about its’ necessity.
The children and their parents were informed about the research task and
the use of data in advance and their permissions were collected according to
the Ethical Regulation of the University of Jyväskylä1. Further, I used
pseudonyms and I blurred the video extracts in order to ensure the
participants’ privacy.
Participants
All together 27 fourth-grade students of two classes completed the desert
island activity. In this study I analyse a pair-work of Janka and Orsi, since
they smelled one of the task objects (vitamin pills) while they were solving
the task. The children recreated their olfactory experience at the verbal,
visual and kinetic levels of reasoning while they negotiated and made their
decision whether they should or should not take the pills.
1
Principles of research data management at the University of Jyväskylä, 2014.
https://www.jyu.fi/tutkimus/tutkimusaineistot/rdmenpdf (accessed on 21 May 2016).
The function of olfactory experience in reasoning 125
Table 1. Transcript.
Results
The micro-level observation of the data indicated that smell and vision in
connection to the synchronised movements of heads, upper bodies, limbs
and verbal processes were integrated while children were negotiating about
the necessity of vitamin pills. Janka recycled the experience of smell to make
her justification meaningful when she pushed the pills under Orsi’s nose.
Her decision was indicated bodily when she put the pills among the
unnecessary objects. Finally, she summarised the action in a verbal utterance
(‘stinky’/ ‘büdös’).
Although there is a constant seek of underlying mental processes which
may regulate human argumentation (e. g. Johnson-Laird, Khemlani and
Goodwin, 2015) the findings of this paper suggest that a wide scale of cross-
sensory experiences have meaningful functions in reasoning. Nevertheless,
linguistic research on the connection between smell and meaning-making
has just started (Pennycook and Otsuji 2015) and the findings of my case
study are also limited. Therefore further studies are needed to explore how
olfactory experiences are contribute in reasoning.
References
Fulkerson, M. 2013. Explaining multisensory experience. In Brown, R. (ed.) 2013,
Consciousness inside and out: Phenomenology, neuroscience and the nature of
experience, 365-373. Dordrecht: Springer.
Johnson-Laird, P. N., Khemlani, S. S. and Goodwin, G. P. 2015. Logic, probability,
and human reasoning. Trends in Cognitive Sciences, 19(4), 201-214.
Lewinson, S. C. and Holler, J. 2014. The origin of human multi-modal
communication. Philosophical Transactions of the Royal Society of London.
Series B, Biological Sciences, 369 (1651), 2013030.
Pennycook, A. and Otsuji, E. 2015. Making scents of the landscape. Linguistic
Landscape 1(3), 191–212.
Gender features in German: Evidence for
underspecification
Andreas Opitz, Thomas Pechmann
Institut für Linguistik, Leipzig University, Germany
Abstract
A series of behavioural experiments is reported that investigate the processing of
grammatical gender of nouns in German. Results consistently indicate processing
differences between nouns of different genders. Masculine nouns show indications
of increased processing cost compared to feminine nouns. We assume that the
lexical representation of nouns is characterized by underspecified gender
information. This assumption is in contrast to more traditional views stating that
only inflected forms are underspecified with respect to grammatical features.
However, the presented account supports the idea that underspecification as a
general characteristic of the mental lexicon is mainly driven by economical reasons:
a feature that is never used for grammatical operations (e.g., evaluation of
agreement) is not needed in the language system at all.
Key words: grammatical gender, underspecification, German, mental lexicon
Background
In models of language processing, grammatical categories (e.g., gender or
case) are traditionally split into distinct classes. For example, grammatical
gender in German classifies into masculine, feminine or neuter. Current
morphological theories however propose more differentiated analyses of
these categories. Almost all frameworks rely on abstract feature
decomposition and the concept of underspecification (see, e.g., Distributed
Morphology (cf. Halle & Marantz, 1993), Paradigm Function Morphology
(Stump, 2001), Minimalist Morphology (Wunderlich, 1996), and many
others). The overall idea behind these two concepts is a decomposition of
traditional labels into more abstract, binary features, thus allowing to refer to
natural classes of such categories. Accordingly, the three instances of
grammatical gender in German can be described by the following two
abstract binary features [±f] and [±m]: ‘feminine’ [+f, −m], ‘masculine’ [−f,
+m], ‘neuter’ [−f, −m]. In contrast, psycholinguistic models of inflection
consistently lack such more differentiated morphological analyses. This
holds, for such diverse models as schema-based models (Bybee, 1995),
variants of connectionist models (cf. Rumelhart & McClelland, 1982), serial
modular models (Levelt, Roelofs, & Meyer, 1999), the Augmented
Addressed Morphology Model (Caramazza, Laudanna, & Romani, 1988),
and others. However, relevant reason to implement the notions of
Results: Main effect for Gender (F1 (2, 58) = 3.55, p < .05; F2 (2,83) =
3.90, p < .05). Decisions for feminine nouns were faster (686 ms) than for
masculine nouns (720 ms). Neuter nouns scored numerically in between
(703 ms) and did not differ statistically from either feminine or masculine
nouns.
Discussion
In all reported experiments we obtained evidence that gender features of
nouns have an impact on language processing in German. Consistently,
masculine nouns induced longer reaction times and partially lower accuracy
rates, both indicating increased processing demands for masculine nouns,
compared to members of the feminine category. We assume that the
observed effects are grounded in an underspecified representation of
grammatical features. In contrast to previous accounts, both in theoretical
linguistics and psycholinguistics, we propose that the notion of
underspecification extends to the representation of gender features of nouns
in the mental lexicon. More precisely, we assume gender features of German
nouns to be lexically specified as follows: masculine nouns:[−f, +m]; neuter
nouns: [−f]; feminine nouns: [ ]. Moreover, the proposed specifications not
only match the present data, but also agree with existing accounts of
inflectional morphology (Blevins, 1995). These specifications can
alternatively be modelled as generic gender nodes in an activation based
model (cf. Levelt et al. 1999). Traditionally, generic gender nodes are
viewed as categorical instances of grammatical gender. Each noun is
associated with one of these nodes. In contrast, an underspecification-based
account predicts that nouns in the mental lexicon differ in the number of
associations to feature nodes (see Figure 1). Thus, the number of these
associations corresponds to processing costs as mirrored in reaction times
and error rates in behavioural experiments. This assumptions
straightforwardly leads to further predictions concerning , e.g., priming
experiments by providing a possible explanation why the so called gender
130 A. Opitz, Th. Pechman
References
Blevins, J. 1995. Syncretism and paradigmatic opposition. Linguistics and
Philosophy, 18, 113–152.
Bybee, J. 1995. Regular morphology and the lexicon. Language and Cognitive
Processes, 10(5), 425–455.
Caramazza, A., Laudanna, A., & Romani, C. 1988. Lexical access and inflectional
morphology. Cognition, 28, 297–332.
Clahsen, H., Eisenbeiss, S., Hadler, M., & Sonnenstuhl, I. 2001. The mental
representation of inflected words: An experimental study of adjectives and verbs
in German. Language, 77, 510–543.
Friederici, A. D., & Jacobson, Th. 1999. Processing grammatical gender during
language comprehension. Journal of Psycholinguistic Research, 28, 467–484.
Halle, M., & Marantz, A. 1993. Distributed morphology and the pieces of inflection.
In K. Hale & S. J. Keyser (Eds.), The View from Building 20. Essays in
Linguistics in Honor of Sylvain Bromberger. Vol. 24 of Current Studies in
Linguistics (pp. 111–176). Cambridge, Mass.: MIT Press.
Levelt, W. J. M., Roelofs, A., & Meyer, A. S. 1999. A theory of lexical access in
speech production. Behavioral and Brain Sciences, 22(1), 1–75.
Opitz, A., Regel, St., Müller, G., & Friederici, A. D. 2013. Neurophysiological
evidence for morphological underspecification in German strong adjective
inflection. Language, 89(2), 231–264.
Penke, M. 2006. Flexion im mentalen Lexikon. Tübingen: Max Niemeyer.
Rumelhart, D. E., & McClelland, J. L. 1982. An interactive activation model of
context effects in letter perception: Part 2. The contextual enhancement effect
and some tests and extensions of the model. Psychological Review, 89, 60–94.
Stump, G. 2001. Inflectional Morphology. Cambridge: Cambridge University Press.
Wunderlich, D. 1996. Minimalist morphology: The role of paradigms. In: G. Booij
& J. van Marle (Eds.), Yearbook of Morphology 1995 (pp. 93–114). , Dordrecht:
Kluwer.
Distributional analysis of Russian lexical errors
Polina Panicheva
Department of Mathematical Linguistics, Saint Petersburg State University, Russia
Abstract
An algorithm of analyzing obscure lexical collocations is proposed. It is based on a
co-occurrence model and distributional semantic filtering. We apply the proposed
technique to lexical errors of construction blending, as annotated in the Corpus of
Russian Student Texts. Results of error processing are analyzed and classified;
reasons for different results in the paraphrasing experiment are discussed.
Keywords: Distributional Semantics, lexical errors, construction blending, Russian.
Introduction
We propose a framework for analyzing violation of syntagmatic relations
resulting in construction blending [Puzhaeva et al. 2015]. Our toolkit
includes models of meaning and selectional restrictions, applied to analyzing
different types of abnormal collocations: native speakers’ and learners’
errors, metaphorical expressions, peculiarities in clinical texts, etc. The
algorithm allows to identify and correct obscure collocations. We discuss the
application of our approach to a corpus of native speaker errors.
Datasets
As a training corpus we use the RNC-Sketches syntactic bigram statistics. It
provides statistics on syntactic relations in the Russian National Corpus
(RNC), where every keyword is associated with a list of its relations and
their frequencies in terms of MaltParser and TreeTagger; the latter are used
to create RNC Sketches [Sharoff 2008, Sharov 2011] to the testing data.
Total word frequencies were obtained from the Russian Frequency
Dictionary [Lyashevskaya, Sharov 2009]. We supply our algorithm with an
RNC-based Word2Vec semantic model [Kutuzov, Andreev 2015].
The data used for automatic error analysis is provided by the Corpus of
Russian Student Texts (CoRST). It contains educational texts by native
speakers of Russian and is annotated with different types of errors. The
errors caused by construction blending [Puzhaeva et al. 2015] are especially
relevant to our task, as they present subtle violations of selectional
restrictions.
Statistical models
We use the RNC-Sketches syntactic bigrams as the syntactic model and
apply automatic ranking of the erroneous keywords based on their context.
The list of possible substitutes for a particular keyword is the intersection of
the words occurring with every syntactic relation in the keyword context.
The substitutes are ranked using the association measure scores: context-
based paraphrasing (CBP) [Shutova 2010], and Word2Vec-based semantic
scoring [Kutuzov, Andreev 2015].
Context-based paraphrasing
The context-based paraphrasing (CBP) likelihood estimation is based on the
same grounds of syntactic co-occurrence, but is not symmetric and does not
account for context word frequencies:
N
f (w , r , i)
n n
(1) Li (CBP) n 1
( f (i)) N 1
Experiment setting
We perform a proof-of-concept experiment by analyzing the errors caused
by construction blending in CoRST with context-based paraphrasing and
additional Word2Vec semantic scoring. The errors are made by native
speakers and represent violations of selectional restrictions. There are 27
sentences in the corpus annotated with a noun presenting a lexical
construction blending error. We set out to automatically suggest a list of
substitutes for the erroneous nouns and score them according to the CBP
procedure with Word2Vec semantic filtering.
The results are manually analyzed, and the errors are grouped according
to their proposed substitution candidates. The first group contains errors for
which the distributional algorithm proposed no relevant candidates. For the
Distributional analysis of Russian lexical errors 133
Conclusions
The distributional approach to lexical errors is an adequate measure of the
distributional specificity of a construction in text; it also presents a useful
tool which automatically suggests lexical substitutes for unusual lexical co-
occurrences. Where lexical substitution is impossible, manual analysis
confirms no lexical error in the sentence (44%). Proposed lexical substitutes
(56%) are correct in 60% and 87% in strict and loose mode respectively.
Future work includes modifying the morphosyntactic analysis to
minimize parsing errors. Future applications of the approach include specific
error collections, i.e. language acquisition and learner errors, clinical texts, in
order to shed light on their distributional nature.
134 P. Panicheva
Acknowledgements
The reported study is supported by RFBR grant 16-06-00529.
References
Kutuzov, A., Andreev, I. 2015, 'Texts in, meaning out: neural language models in
semantic similarity task for Russian', arXiv preprint arXiv:1504.08183.
Lyashevskaya, O., Sharov, S. 2009, The Frequency Dictionary of Modern Russian
(on the materials of the Russian National Corpus), Moscow. (in Russian)
Puzhaeva, S.; Zevakhina, N., Dzhakupova, S. 2015, Construction blending in non-
standard variants of Russian in the Corpus of Russian Student Texts. Proc. 6th
Intern. Conf. “Corpus Linguistics-2015”, 390-397. St. Petersburg. (in Russian)
Sharoff, S.; Kopotev, M.; Erjavec, T.; Feldman, A., Divjak, D. 2008, Designing and
Evaluating a Russian Tagset., in 'LREC'.
Sharov, S., Nivre, J. 2011, The proper place of men and machines in language
technology. Processing Russian without any linguistic knowledge. Proc. Annual
Intern. Conf. Dialogue, Computational Linguistics & Intellectual Technologies',
pp. 657.
Shutova, E. 2010, Automatic metaphor interpretation as a paraphrasing task, in
'Human Language Technologies: The 2010 Annual Conference of the North
American Chapter of the ACL', pp. 1029--1037.
Serbian pitch accents in tri-syllables produced by
Serbian and Russian speakers
Ekaterina Panova
Dept of history and theory of language, St. Tikhon’s Orthodox University, Russia
Abstract
This study is based on the analysis of tri-syllables in initial, medial and final position
of statements. For each syllable of the tri-syllables the set of pitch parameters was
calculated, as well as F0 inter-syllable intervals. In FA pitch parameters reach
maximum values on first syllables and in RA – on second ones. FA and RA more
differ in initial than medial position and tend to neutralization in final position. In
initial and medial position Russian speakers realize a “type of accent” that is similar
to Serbian RA and in final position a “type of accent” that similar to Serbian FA.
Key words: pitch parameters, pitch accent, Russian, Serbian, tri-syllable.
Introduction
Traditionally, Serbian stress is characterised by two contrasts – pitch
(falling/rising) and duration (long/short) that make four combinations: long
rising (LR), long falling (LF), short rising (SR) and short falling (SF).
Nevertheless, such clear classification of Serbian pitch accents, formed by
the end of XIX century, has been revealing many discussions (see Lehiste,
Ivic 1986, Keijsper 1987, Jokanovic-Mihajlov 2006). The main problems are
concerned the distinctive parameters of falling (FA) and rising accents (RA).
Recent investigations confirmed that standard Serbian pitch contrasts
realized on the sequence of stress and post-tonic syllable(s): negative
intervals between stress and post-tonic syllable are typical for FA, while
positive intervals for RA; FA have early peak locations, while RA late ones.
Our studies (Panova 2015, Panova 2016) supported these previous
investigations and revealed that in di-syllables post-tonic syllable provided
better FA/RA distinction than stressed one. The parameter of peak location
(i.e. timing of F0 maximum) can provide FA/RA distinction only with
respect to the pitch contour of the whole word, but not only with respect to
stressed syllable. Russian speakers had difficulties in the production of
FA/RA Serbian contrast: in non-final position of the statements they
produced “types of accents” that were similar to Serbian RA.
Method
For the present study 42 words of tri-syllables with stress on the first syllable
and different types of accents were selected. Each target tri-syllable word
Results
The results for Serbian speakers showed that the main effects of
SYLLABLE and ACCENT as well as interaction between SYLLABLE and
ACCENT were highly significant (p<0.0001) for F0 start value, F0 end
value, F0 maximum, F0 minimum, F0 mean value in initial and medial
position (in Figure 1 we give an example of F0 start values). For all these
pitch parameters we can obtain the same tendencies: FA reach maximum
values on first syllables, second and third syllables demonstrate gradual
decrease, while RA reach minimum values on first syllable (for F0 end value
on third syllable) and maximum values on second syllable. Post hoc test
showed that regarding these pitch parameters Serbian four accents divided
mostly on two types: falling (LF and SF) and rising (LR and SR), within
these types there is not any significant difference. At the same time Serbian
accents differ more clearly in initial than in medial position, where the
distinction between FA and RA is broken, because SF values in second
syllable approach to the values of RA. Regarding syllables Serbian accents
differ more in first and second syllables than in third ones.
In final position the results of these pitch parameters didn’t show any
significance regarding the main effect ACCENT, although the effect of
SYLLABLE and interaction between ACCENT and SYLLABLE were
significant (p<0.001).
Serbian pitch-accents produced by Serbian and Russian speakers 137
The results for F0 range and timing of F0 maximum for Serbian speakers
were not significant in all positions.
Figure 1. F0 start value scores of the first (1), second (2) and third (3) syllables with
LF, LR, SF and SR for Serbian and Russian speakers in initial, medial and final
position.
For Russian speakers the main effect of ACCENT was not significant
regarding all pitch parameters except for marginally significant results for F0
start value (p=0.043) and F0 maximum (p=0.047) in final position. However,
for Russian speakers the main effect of SYLLABLE was highly significant
(p<0.0001) for all the parameters in all positions except for F0 range. As we
can see from Figures 1, in initial and medial position the values of pitch
parameters for Russian speakers are similar to Serbian RA: maximum values
are reached on second syllable, while minimum values are on first one. In
final position, on the contrary, the values of pitch parameters for Russian
speakers are similar to FA.
Figure 2. F0 inter-syllable interval scores between first and second (1) and second
and third (2) syllables with LF, LR, SF and SR for Serbian and Russian speakers in
initial, medial and final position.
interval between second and third syllable is not significant for FA/RA
distinction (see Figure 2). FA have smaller intervals between first and
second syllable than RA. For Russian speakers the values of interval
between first and second syllable are similar to Serbian RA.
Conclusion
The results of pitch parameters of the tri-syllables produced by Serbian
speakers showed that FA/RA distinction is provided on all three syllables,
although first and second syllables are more significant than third one. For
FA the pitch parameters (F0 start value, F0 end value, F0 maximum, F0
minimum, F0 mean value) reach maximum values on first syllables and
minimum values on third one, while for RA pitch parameters reach
maximum values on second syllables and minimum on first ones (except for
F0 end value). The FA/RA contrast realizes more clear in initial, than medial
position and in final position tends to FA/RA neutralization. In medial
position the values of pitch parameters of the second syllable for SF
approach to the values for RA, that correspond with the fact about tonal
prominence of SF (Lehiste, Ivic 1986). FA/RA contrast can also be observed
on the different F0 inter-syllable intervals between first and second syllable:
RA have larger intervals than FA. The values of F0 range and timing of F0
maximum didn’t demonstrate any FA/RA distinctive ability.
Regarding analyzed pitch parameters Russian speakers realize a “type of
accent” that is similar to Serbian RA in initial and medial position and a
“type of accent” that similar to Serbian FA in final position.
References
Jokanovic-Mihajlov J. 2006. Akcenat i intonacija govora na radiju i televiziji.
Beograd.
Keijsper, C.E 1987. Studing Neoštokavian Serbocroation Prosody. Dutch Studies in
South Slavic and Balkan Linguistics. SSGI 10, 101-193.
Lehiste I., Ivic P. 1986. Word and sentence prosody in Serbocroatian. Cambridge,
Mass., MIT Press.
Panova E. Realization of Serbian accents by Serbian and Russian speakers (analysis
of pitch parameters). Proc. International Conference of Experimental Linguistics
ExLing 2015, 26-27 June 2015, Athens, Greece, 58–61
Panova E. L1 and L2 Serbian accents: Analysis of Pitch Parameters. Proceedings of
the Speech Prosody 2016, May 31 - June 3, 2016, Boston, MA, USA, 474–478.
Smirnova, N., Starshinov A., Oparin I. & Goloshchapova T. 2007. Speaker
Identification Using selective Comparison of Pitch Contour Parameters. Proc.
16th ICPhS, Saarbrucken, 203–206.
Effect of saliency and L1-L2 similarity on the
processing of English past tense by French
learners: an ERP study
Maud Pélissier1, Jennifer Krzonowski2, Emmanuel Ferragne1
1
Laboratoire CLILLAC-ARP, EA 3967, Université Paris Diderot, France
2
Laboratoire DDL, UMR 5596, CNRS – Université Lyon 2, France
Abstract
This study explored the effect of saliency and L1-L2 similarity on the processing of
second language morphosyntax. ERP responses to violations of past tense
morphology were obtained from adult intermediate French learners of English.
Results show that participants processed L2-specific violations as salient events and
not as morphosyntactic incongruities.
Key words: ERPs, L2 processing, syntax, L1-L2 similarity, saliency
Introduction
The way the syntax of our first language (L1) interacts with the syntax of a
language we are trying to learn (L2) remains a much debated issue in the
field of SLA. Some of the possible facilitating factors include the presence
of similar structures in the L1 and the saliency of the morphosyntactic
structure under scrutiny in the L2 (MacWhinney, 2005). In this study, we
focused on a structure that contrasts these two factors: ERP responses to
morphosyntactic violations of the past tense in polar questions in French
learners of English with the auxiliaries DID and HAD. Polar questions using
HAD followed by a past participle work in a way that is similar to French,
where the past tense is marked both on the auxiliary and the main verb. On
the contrary, questions with DID are specific to English in that the past tense
is marked only on the auxiliary. However, violations of past-tense inflection
are phonologically more salient with DID, where a past morpheme is added
to the main verb, than with HAD.
Methods
Participants
26 intermediate French learners of English (5 male, aged 18.5 ± 1) took part
in the experiment. They were first year University students of English
having spent less than a month in an English-speaking country.
Results
Behavioural measures: the GJT
A sensitivity index (d’) was computed for each participant and each
auxiliary. Analyses showed that the participants’ d’ was marginally better in
the Had condition (F(1,25)=3.48, p=.07) but their response time was shorter
with DID (F(1,25)=7.98, p<.01) : on average, it took them 562 ms to
respond to sentences containing DID and 634 ms for sentences containing
HAD.
Effects of salience and L1-L2 similarity on processing of past tense 141
EEG results
A repeated-measures ANOVA with mean amplitude in the P600 window as
dependent variable and Condition (Correct / Incorrect), Auxiliary (DID /
HAD), Hemisphere (Left / Right) and Region (Anterior / Posterior) as
within-subject variables showed an effect of the interaction between
Condition and Auxiliary (F(1,28)=9.15, p<.01). Post-hoc analyses revealed
that the effect of Condition in this time window was limited to sentences
with DID (p<.001). A similar ANOVA was conducted on the mean
amplitude in the 300-500 ms window and an effect of the Condition ×
Auxiliary interaction (F(1,28)=25.68, p<.001) was found. Post-hoc analyses
revealed that with DID, the amplitude was greater in the Incorrect than in the
Correct condition (p<.001) but that with HAD, the amplitude was more
negative in the Incorrect than in the Correct Condition (p<.001).
Discussion
Violations in the DID condition thus elicited a P600 as well as a positive
peak in the 300-500ms window, resembling a P3 component. These
violations involve the presence of the past morpheme in a context where it
should be absent. They are therefore more phonetically salient than
142 M. Pelissier, J. Krzonowski, E. Ferragne
violations with HAD, which are due to the absence of this same morpheme.
These results are therefore consistent with the hypothesis that the P600
reflects, as the P3 does, the subjective salience of the stimulus (Sassenhagen,
Schlesewsky, & Bornkessel-Schlesewsky, 2014). Besides, polar questions
with DID represent a complex L2-specific structure, since they involve the
movement of the inflectional morpheme from the main verb (where it would
be in a declarative sentence) to the auxiliary. This represents an additional
processing cost; yet participants were faster to decide for these sentences.
This apparent discrepancy, as well as the presence of the P3, suggests that
the P600 effect observed here in the DID condition is not a reflection of a
better perception of the morphosyntactic error at hand but of an explicit
reaction to the superior saliency of this violation.
Violations in the HAD condition elicited a negativity in the 300-500ms
window that was not limited to anterior sites, thus more reminiscent of an
N400 than a LAN. N400 effects have been found to be elicited by
morphosyntactic violations even in native speakers (Tanner & Van Hell,
2014), possibly because those speakers rely more on lexico-semantic
information to process their native language. It thus seems that these
violations with HAD were not perceived as subjectively salient events but as
lexical violations.
These results suggest that when the processed structure does not exist in
the L1, other cues such as the phonological salience of the violation are used
to process morphosyntactic violations. These findings also have theoretical
relevance since they strongly support the P600-as-P3 hypothesis.
Acknowledgements
This research was supported by an IUF grant awarded to Dr. Emmanuel
Ferragne.
References
MacWhinney, B. 2005. Extending the Competition Model. International Journal of
Bilingualism, 9(1), 69–84.
Sassenhagen, J., Schlesewsky, M., & Bornkessel-Schlesewsky, I. 2014. The P600-
as-P3 hypothesis revisited: Single-trial analyses reveal that the late EEG
positivity following linguistically deviant material is reaction time aligned.
Brain and Language, 137, 29–39.
Tanner, D., & Van Hell, J. G. 2014. ERPs reveal individual differences in
morphosyntactic processing. Neuropsychologia, 56(1), 289–301.
Phonostylistic study of Spanish-speaking
politicians: Populist vs. conservative
Carmen Patricia Pérez
CLILLAC-ARP, Université Paris Diderot – Paris 7, France
Abstract
Conservative and Populist politicians can be easily recognized thanks to their
phonostyle characterized by specific prosodic patterns. In this study, I analyzed four
politicians’ phonostyle in public ‘spontaneous’ speeches: Hugo Chavez (HC), José
D. Ortega (JO), José R. Zapatero (Z) and Enrique Peña (EP). The acoustic analysis
suggests that two main types of phonostyles can be found: a populist’s phonostyles
(HC and JO) and a conservative one (Z and EP).
Introduction
Conservative and populist politicians have a particular and typical way of
speaking, their own ‘phonostyle(s)’, varying according to the different
‘phonogenres’ (specific conditions of productions such as interview, public
speech, etc.). They are easily recognizable by the public. Studies on French
politicians show that it is thanks to prosodic features such as prominence,
acceleration, register change, breaks, etc. (Fónagy 1983; Duez 1997; Touati
1995; Léon 1993; Martin 2012). I will describe the prosodic features used by
4 Spanish-speaking politicians in public ‘spontaneous’ speeches: H. Chávez
(Venezuela), J. Ortega (Nicaragua), J. Zapatero (Spain) and E. Peña
(Mexico). This study is purely phonostylistic; I consider that the differences
observed are due to the social and political backgrounds and not to the
different varieties of spoken Spanish (Sosa 1999; Hualde & Prieto 2015).
Methodology
Corpus
The 4 realizations illustrated below come from ‘spontaneous’ public
speeches delivered by the 4 politicians. They may be considered as
representative of each speaker.
Intonation model
The interpretation of the prosodic analysis is based on Ph. Martin’s model
“Incremental Prosodic Structure” (1975-2015), where rising and falling
contours do contrast indicating a relation of dependency between them,
triggered by the following contours, firstly the final one of the utterance.
These contours are developed on prosodic words (aka accent phrases, group
of one or more words with only one stressed syllable). They are described as
follow: C0: Fall (very low) on the last stressed syllable and eventually on the
following unstressed syllables to signal the end of an utterance; C1: Rise,
above the glissando threshold (see the glissando formula in Rossi 1971,
correlated with the speed of the melodic change); C2: Non-final falling
contour, above the glissando threshold; Cn: ‘Neutralized’, i.e. slightly rising
or falling, with a shortened vowel, below glissando threshold; Cc: fall-rise,
flat or slightly falling on the stressed syllable and rising on the following
unstressed one(s). Ch is phonetic, used by HC; it falls very low (‘high dive’
and lengthening on the last syllable) at the end of each intonation phrase
(IP).
Acoustic analysis
After an initial perceptual analysis (Pérez 2014), the four politicians were
classified in two different groups: populist (HC and JO) and conservative (Z
and EP).
For EP, the contour frequently employed is also C1 on the stressed syllable
with a following unstressed syllable seldom rising but most of the time
falling a little (but never like HC or JO). In the prosodic structure there is
more contrast at the top and lower levels hierarchy.
References
Duez, D., 1997. Acoustic markers of political power. Journal of Psycolinguistic
Research, 26(6), 641-654.
Fónagy, I., 1983. La vive voix. Essais de psycho-phonétique. Paris: Payot.
Frota, S. & al., 2007. The phonetics and phonologie of intonational phrasing in
romance. Current issues in linguistics theory, pp. 131-154.
Léon, P., 1993. Précis de phonostylistique. Parole et expressivité: Nathan.
Martin, P., 1975. Analyse phonologique de la phrase française. Linguistics 146, pp.
35-68.
Martin, P., 2010. Intonation in Political Speech: Ségolène Royal vs. Nicolas
Sarkosy. Rome, pp. 54-64.
Martin, P., 2015. The structure of spoken language. Intonation in Romance:
Cambridge University Press.
Rossi, M., 1971. Le seuil de glissando ou le seuil de perception des variations
tonales pour la parole. Phonetica, Volume 23, pp. 1-33.
Sosa, J. M., 1999. La entonación del Español: Su estructura fónica, variabilidad y
dialectología. Madrid: Catedra.
Touati, P., 1995. Pitch range and register in french political speech. Proc. XIII
International Congress of Phonetic Sciences, Volume 4, pp. 244-248.
Experimental L2 text production with WinPitch
LTL
Darya Sandryhaila-Groth
LLF, UFR Linguistique, Paris-Diderot Paris 7, France
Abstract
Speech production of adults learning French as a second language in a non-
francophone environment will be discussed in this paper. The focus is mostly on the
prosody of French. Two groups of adult US native speakers used WinPitch Pro and
its WinPitch LTL version for teaching and learning a foreign language. Their
respective performances have been compared and evaluated.
Key words: Second language and prosody teaching, speech visualization.
Introduction
The oral performance in French as L2 has been ignored for a long time,
especially suprasegmental but also their segmental aspects (Guimbretière
1994, 2000; Lauret 2007). Only recently, notable changes have occurred for
learners of French, i.e., when authors of teaching methods began to be more
interested in phonetics and included several exercises of repetition,
discrimination, etc. in their textbooks of French (Abry 2009; Abry and
Chalaron 2011; Kamoun and Ripaud 2016).
Methodology
In this study, two groups of learners of French were analyzed. All of them
were American English native speakers and had an intermediate level in
French. The first group of participants were university students at UCLA and
the second one were adult students at the French language school Alliance
française.
In a first step, individual comments were provided to each of the
students, after the instructor has been listening to their individual recordings
with Audacity software. The students had worked in groups and been
listening to each other, and they were all interacting during the learning
process. They were able to give their opinion about the quality of the
repetition of a student, and his phonetic/prosodic errors. In addition, the
instructor was listening and correcting the oral productions as well. To
simplify the repetition task, models of the sentences were played to the
students at reduced speed (70%), with the help of the WinPitch software. At
the end of a training period, a final recording of each of the students in both
groups was made with WinPitch LTL.
Hypothesis
The first hypothesis is that the first group of young university students at
UCLA (on average 28 years old) has a better performance in their speech
production than the second group of adult students (on average 65 year old);
not only because the age difference, but also because of the first group
learning French as a main subject in their university syllabus, while the
second one is learning French mainly for pleasure and travel purpose.
The second hypothesis is that the real-time visualization during the
prosodic training with WinPitch helps the students in improving their quality
and 'natural sounding' of their speech productions in French.
Corpus
The corpus includes recordings from a model French speaker and the
students from the two groups, all reading a short declarative text “Dimanche
en famille”, a text coming from a short story written by P. Léon.
In this paper, only one sample sentence out of the whole corpus is
analyzed: Elle aimerait bien une petite friture de poissons. “She would
like to eat some deep-fried fish.” Results from two male speakers of the first
group and two female speakers of the second group are shown, see the Figs
below.
WinPitch and L2 teaching
In this study we work with WinPitch LTL, a program developed for
language teaching and learning by Philippe Martin, and WinPitch Pro.
WinPitch LTL was first presented to potential users in Martin and Germain
(2000), and is innovative in its real-time visualization. Designed as a
traditional language lab with two tracks, the students first listen to the model
speech and then try to reproduce it. The instructor can directly correct errors
of the student's repetition (suprasegmental and segmental) or add comments
for the next class. He can also manipulate the F0 curve and use different
colorings to highlight, e.g., a rising/falling intonation or a final intonation.
Figure 1.
Figure 2.
Figure 3.
Figure 4.
Figures 1-4. Figure 1: Group 2 student 1, WP Pro first recording (left) and WP LTL
final recording (right). Figure 2: Group 2 student 2, WP Pro first recording (left)
and WP LTL final recording (right). Figure 3: Group 1 student 1, WP Pro first
recording (left) and WP LTL final recording (right). Figure 4: Group 1 student 2,
WP Pro first recording (left) and WP LTL final recording (right).
150 D. Sandryhaila-Groth
Table 1.
Elle aimerait bien une petite First rec. Final rec. First rec. Final rec.
friture de poissons AfSp.1gr.2 AfSp.1gr.2 AfSp.2gr.2 AfSp.2gr.2
Prosodic words - - + +
Declarative (C0d) + + + +
Prominence “bien” + + + +
Speech fluency&v.linking - - + +
Table 2.
Elle aimerait bien une petite First rec. Final rec. First rec. Final rec.
friture de poissons Sp.1gr.1 Sp.1gr.1 Sp.2gr.1 Sp.2gr.1
Prosodic words - + - +
Declarative (C0d) +(!) + + +
Prominence “bien” + + + +
Speech fluency&v.linking - + - -/+
Conclusions
The results of the presented sample sentence suggest a clear improvement in
the speech production for the students in both groups after a training with
WinPitch LTL. In a next step, we will continue analyze the full corpus, read
by other speakers from the two groups, to confirm the hypothesis that the
language proficiency depends on the pursued purpose, the wish to sound
more natural, to be aware of the foreign language intonation while speaking.
References
Abry, D., Chalaron, M-L. 2009. Les 500 Exercices de phonétique A1/A2. Hachette.
Germain, A., Martin, P. 2000. Présentation d’un logiciel de visualisation pour
l’apprentissage de l’oral en langue seconde. www.alsic.org, 3, No 1, 61–76.
Gumbretière, E. 1994. Phonétique et enseignement de l'oral. Paris, Didier-Hatier.
James, E. 1976. The acquisition of prosodic features of speech using a speech
visualizer. IRAL 14 (3):227–243.
Kamoun, Ch., Ripaud, D. 2016. Phonétique essentielle du français. 100% FLE,
Paris, Didier.
Lauret, B. 2007. Enseigner la prononciation du français: questions et outils. Paris,
Hachette.
Martin, Ph. 1982. Utilisation d'un visualiseur de mélodie en vue d'une didactique.
Options nouvelles en didactique du français langue étrangère. 181–186. Paris,
Didier.
Martin, Ph. 1975. Analyse phonologique de la phrase française. Vol. 146, 35–68.
Linguistics.
WinPitch LTL, 2015. www.winpitch.com.
Exploring prosodic convergence in Italian game
dialogues
Michelina Savino1, Loredana Lapertosa1, Alessandro Caffò1, Mario Refice2
1
Dept. of Education, Psychology, Communication, University of Bari, Italy
2
Dept. of Electrical and Information Engineering, Polytechnical Univ. of Bari, Italy
Abstract
In this study we explore the manifestation of prosodic convergence between pairs of
Italian speakers involved in a non-competitive game. Results show evidence of
prosodic convergence and/or divergence between partners, where prosodic
parameters and coordination strategies involved can vary across dialogue pairs.
Also, degree of asymmetry in prosodic convergerce appears to be related to speaker
empathy.
Keywords: Prosodic convergence, game dialogues, Italian, Big Five Questionnaire
Introduction
Conversational partners have been observed to adapt each other’s speech
over the course of the interaction. This phenomenon, variously termed as
convergence, entrainment, alignment, accomodation, or adaptation, is
considered crucial for mutual understanding and successful communication,
and influenced by linguistic, social, and cultural factors (e.g. Giles et al.
1973). A body of research has been devoted to measuring prosodic
parameters involved in speech adaptation in a number of languages (for
example, Levitan et al. 2015) not including Italian. This paper offers a
preliminary investigation on prosodic convergence between Italian
interactans, and explores the influence of speakers’ personality traits on the
convergence process.
Method
Corpus
Our corpus consists of five dialogues where pairs of players are involved in a
modified version of the old Chinese Tangram Game, as developed within the
PAGE project. Participants in a game round were given Tangram figures
according to their role: the Director, who received a set of four Tangram
figures, one of which marked by an arrow; the Matcher, who was given one
of the figures belonging to the Director’s set. Players could not see each
other’s figures, and goal of the game in each round was to establish – on the
basis of common agreement – whether the figure given to the Matcher was
the same as the one marked by the arrow in the Director’s set, or not. The
game session consisted of 22 rounds, with an average duration of 30 min.
Speakers were selected according to a number of parameters which could
influence adaptation, namely age, gender, and familiarity. They were all
young adult females (aged 21-24), and MA student classmates. After the
recording sessions, participants were administered the Big Five
Questionnaire (BFQ-2, Caprara et al. 2007), a protocol used in psychology
for assessing individual “Big Five Personality Factors” (Energy,
Friendliness, Conscientiousness, Emotional Stability, Openness) along with
their subdimensions.
Measuring convergence
Given the explorative nature of this study, we started focussing on global
aspects of speech coordination, i.e. those referring to similarity process
undergoing at the level of the whole dialogue. We basically follow the
approach proposed by Eldlund et al. (2009) in defining similarity as
underlined by a) convergence, the process by which conversational partners’
speech features become more similar over time until they converge; b)
synchrony, when speakers’ speech happen to have similar patterns over time.
Due to space limitations, in this paper only results on convergence are
presented. We looked for evidence of convergence by identifying cases in
which speakers mean values were more similar to each other later in the
dialogue. Accordingly, we splitted each game session into two halves: a
window consisting of rounds 1-11 vs another window including rounds 12-
22. Within each of the two windows, we compared mean values of speaker1
vs speaker2. Mean values found as significantly different in the first half but
not in the second half were considered as evidence for convergence. Note
that convergence can be realised on the opposite direction as a
complementary manifestation of adaptation, i.e. divergence (Healy et al.
2014). Consequently, in our hypothesis mean values found as not
significantly different in the first half but significantly different in the second
half of the session were considered as evidence of divergence. All other
cases were not taken as evidence for convergence or divergence.
Exploring prosodic convergence in Italian game dialogues 153
Results
Prosodic convergence/divergence
Table 1 shows results of speaker1-speaker2 mean values comparison for
each prosodic parameter, in the first vs. second halves of each game session.
We found statistical evidence of convergence and/or divergence in four out
of five dialogues: speakers in dialogue PZ become more similar in their
voice loudness in the second part of the dialogue (convergence), whereas
speakers in dialogue RC show complementary convergence by significantly
diverging in their articulation rate in the second half of the session. In
dialogues DS and CD we found both types of manifestation of overall
coordination: participants in dialogue DS converge in their articulation rate,
and diverge in the loudness of their voices, whereas speakers in dialogue CD
converge in pitch range and diverge in pitch level. Speakers converging by
some speech features yet diverging by some others in the same interaction
has been reported (e.g. Bilous & Krauss, 1988, Eldlund et al. 2009).
Table 1. Comparison of speaker1 vs. speaker2 mean values in the first vs. second
halves of dialogue (two-tailed t-test, t values only when significant: *=p<.05,
**=p<.01, ***=p<.001). Light gray shaded boxes indicate convergence; dark gray
shaded ones indicate divergence.
Convergence/Divergence
Dialogue
Table 2. Mean values differences (2nd–1st halves of dialogue) for each speaker in
dialogues where convergence and/or divergence were observed, along with
individual T scores for “Empathy” as assessed by the BFQ-2.
Dialogue Speaker Convergence Divergence Empathy
2nd-1st halves 2nd-1st halves (BFQ-2 T scores)
(mean values) (mean values)
CD sp1 10.12 9.50 58
sp2 18.31 -2.50 70
DS sp1 0.01 -0.90 56
sp2 0.46 0.49 65
PZ sp1 -0.04 - 59
sp2 -2.53 - 76
RC sp1 - -0.43 61
sp2 - 0.03 72
References
Bilous F.,R., Krauss, R.M. 1998. Dominance and accommodation in the
conversational behaviours of same- and mixed-gender dyads. Language and
Communication 8, 183-194.
Boersma, P. 2001. Praat, a system for doing phonetics by computer. Glot
International 5(9/10), 131-151.
Caprara, G.V., Barbaranelli, C., Borgognani, L., Vecchione, M. 2007. Big Five
Questionnaire-2, Giunti: Firenze.
Eldlund J., Heldner M., Hirschberg J. 2009. Pause and gap length in face-to-face
interaction. In: Proceedings of Interspeech 2009, 2779-2782, Brighton, UK.
Giles, H., Taylor, D.M., Bourhis R.Y. 1973. Towards a theory of interpersonal
accomodation through speech. Language in Society 2, 177-192.
Healey P., Purver M., and Howes C. 2014. Divergence in dialogue. PloS one 9(6)
e98598, 1-6.
Levitan, R., Benus, S., Gravano A., Hirschberg J. 2015. Acoustic-prosodic
entrainment in Slovak, Spanish, English and Chinese: A cross-linguistic
comparison. In Proceedings of SIGDial 2015, 325-334, Prague, Czech Republic.
PAGE (Prosodic And Gestural Entrainment in conversational interaction in diverse
languages) project: http://page.home.amu.edu.pl/
Syllable cueing and segmental overlap effects in
tip-of-the-tongue resolution
Nina Jeanette Sauer
Goethe-University Frankfurt, Phorms Education Frankfurt
Abstract
The tip-of-the-tongue (TOT) phenomenon refers to a temporary word finding
failure. To induce TOTs in the lab, a common method is to ask for terms after
providing created definitions. When in a TOT, syllable cues were presented in order
to manipulate TOT resolution. After the presentation of the correct first syllable of
the target word, TOTs could be resolved faster and more accurately than after the
presentation of an incorrect syllable of some other word or the control condition
(Experiment 1: syllable cueing effect). The presentation of the extended syllable of
the word (the first syllable with one more segment) facilitated TOT resolution and
boosted lexical retrieval even more than the regular syllable (Experiment 2:
segmental overlap effect).
Key words: tip-of-the-tongue (TOT), resolution, cueing, syllable, segmental overlap
Introduction
The tip-of-the-tongue phenomenon (TOT) represents a temporary
impairment in speech production. When experiencing a TOT, one has access
to semantic (concept) and syntactic information (lemma) but only partial
access to phonological information (lexeme). While the complete word form
cannot be retrieved, one has a strong feeling of knowing the word and “recall
is felt to be imminent” (Brown & McNeill 1966, p. 325). Often, speakers are
able to retrieve the first letter or phoneme, the number of syllables and also
words with similar sound and similar meaning (Brown 2012, p. 196).
In order to induce TOTs in a laboratory setting, definitions were
presented on a computer screen, for example, “a lift consisting of a series of
linked compartments moving continuously” for paternoster. In the cueing
paradigm so far, syllable cues were embedded in words or pseudowords, and
presented in word lists in order to manipulate TOT resolution (for an
overview, see Hofferberth-Sauer & Abrams 2014). Abrams, White, and Eitel
(2003) illustrated, for example, that the entire first syllable is required for
TOT resolution – the first phoneme or first grapheme alone had no effect. In
the present studies, syllable cues were presented in isolation. The advantage
of this procedure is that the syllable itself has no semantic and syntactic
information. The presentation of isolated correct, incorrect, and extended
syllables is new in TOT research.
Previous studies
In the pre-tests, definitions had been collected and verified (Hofferberth,
2011). In two pilot studies (Hofferberth 2012), the design of the experiment
was evaluated, and more definitions were collected and validated.
Thereafter, two experiments were performed. The first experiment
(Hofferberth 2014; Hofferberth-Sauer & Abrams 2014) will be presented
here only marginally while the focus is on the second experiment (cf. 3.). All
the data was collected within my Ph.D. project (Sauer 2015).
Experiment 1
In the first experiment, definitions were presented on a computer screen.
When in a TOT, one of three cues was presented. It was shown that after the
presentation of the correct syllable (e.g., pa for paternoster), TOTs could be
resolved about twice as fast compared to after an incorrect syllable (e.g., co)
and to the control condition (xxx). The correct syllable also led to
significantly more accurate answers (M = 73.5%, SD = 18.6%) compared to
the control condition (M = 24.3%, SD = 16.4%, t(47) = 16.39, p .001), and
to the incorrect syllable (M = 16.0%, SD = 13.6%, t(47) = 20.06, p .001).
The control condition led to significantly more accurate TOT resolutions
compared to the incorrect syllable (t(47) = 3.71, p = .001). The incorrect
syllable did not block TOT resolution (not leading to more inaccurate
answers), but there was an inhibition effect: There were fewer accurate
answers and more unresolved TOTs. After demonstrating the cueing effect
of the first syllable in Experiment 1, a further experiment was conducted in
order to test if the syllable border plays a role (syllable preference effect).
Experiment 2
Method
Participants
69 under- and postgraduates (42 female, 27 male) between 21 and 35 years
(M = 27.9 years, SD = 4.3) participated in this study.
Apparatus and material
The material was visually presented on a computer screen using the program
Presentation. There were 240 definitions of German nouns presented in
order to induce TOTs (the English examples here are only for demonstration
purposes).
Syllable cueing and segmental overlap effects in tip-of-the-tongue 157
Procedure
The subjects were told to press a button on the keyboard as fast as possible
indicating that they know the word (KNOW), that they do not know the
word (DON’T KNOW), or that the word is on their tip of the tongue (TOT).
They had 10 seconds to react to the definition. After pressing KNOW, they
typed in the answer, and another definition was presented. After pressing
DON’T KNOW, the next definition appeared on the screen. After pressing
TOT, a cue was presented visually: either the regular syllable (e.g., pa for
paternoster), the extended syllable (e.g., pat), or the control condition
(marked by xxx). The cue was presented for 25 seconds. In this time, the
subjects had to type in their answer.
Results
TOT rate
The number of TOTs varied between 21 (8.8%) and 194 TOTs (80.8%).
Through 16560 stimuli overall, 5600 TOTs were induced, i.e., the TOT rate
was 33.8% with 81 TOTs per person on average (SD = 14.7%). Out of the
5600 TOTs, 3385 TOTs (60.5%) were resolved in the given time of 25
seconds, with reaction times (RTs) between 571 ms and 24948 ms (M =
4049 ms, SD = 4325 ms). There were 50.3% accurate answers, and 10.2%
inaccurate answers.
Cue analysis
The number of accurate TOT resolutions differed between the three types of
cues (F(2, 136) = 415.65, p < .001). With the extended syllable, TOTs were
accurately resolved significantly more often (M = 72.0%, SD = 18.7%) in
comparison to the regular syllable (M = 60.3%, SD = 19.0%, t(68) = 7.00, p
< .001), and to the control condition (M = 18.7%, SD = 13.0%, t(68) = 26.26,
p < .001). The regular syllable led to significantly more accurately resolved
TOTs (t(68) = 19.80, p < .001).
The RTs were significantly shorter after the presentation of the extended
syllable (M = 2330 ms, SD = 887 ms) in comparison to the regular syllable
(M = 2803 ms, SD = 1166 ms, t(67) = 3.92, p .001), and to the control
condition (M = 3017 ms, SD = 1592 ms, t(62) = 2.89, p = .005). There was
no significant difference between the regular syllable and the control
condition (t(62) = 0.78, p = .436).
Discussion
While Experiment 1 showed the syllable cueing effect, i.e., the correct first
syllable helped to overcome transmission deficits from the lemma to the
lexeme level, Experiment 2 showed the segmental overlap effect, i.e. a
158 N.J. Sauer
speaker needs even more than the first syllable for successful TOT
resolution. It was demonstrated that the extended syllable (e.g., pat for
paternoster) significantly speeded up lexical access (shorter RTs), and
significantly increased TOT resolution (more accurate answers) compared to
after the regular syllable (e.g., pa) and to the control condition (xxx). The key
factor was not the syllable per se but the information content: the bigger the
segmental overlap between cue and target, the faster and better the TOT
resolution. Therefore, it is helpful to get as much information as possible
about the beginning of the target word. The unit of the syllable only plays a
marginal role.
Syllable cueing and segmental overlap effects do not have to exclude
each other but rather can both be explained within speech production models
that allow for an interactive activation spreading and have a syllable level
below the phoneme level. For an interpretation and discussion of these
results within different models of speech production see Sauer and Schade
(2016).
References
Abrams, L., White, K.K., Eitel, S.L. 2003. Isolating phonological components that
increase tip-of-the-tongue resolution. Memory & Cognition, 31, 1153-1162.
Brown, A.S. 2012. The tip of the tongue state. New York, Psychology Press.
Brown, R., McNeill, D. 1966. The "tip of the tongue" phenomenon. Journal of
Verbal Learning and Verbal Behaviour, 5, 325-337.
Hofferberth, N. J. 2011. The tip-of-the-tongue phenomenon: Search strategy and
resolution during word finding difficulties. Proc. 4th ISCA Tutorial and
Research Workshop on Experimental Linguistics, ExLing 2011, 83-86. Paris,
France.
Hofferberth, N. J. 2012. On the role of the syllable in tip-of-the-tongue states. Proc.
International Conference of Experimental Linguistics, ExLing 2012, 57-60.
Athens, Greece.
Hofferberth, N. J. 2014. Resolution of lexical retrieval failures. Reaction time data in
the tip-of-the-tongue paradigm. Proceedings of the International Seminar on
Speech Production. ISSP 05-08 May 2014, 194-197. Cologne, Germany.
Hofferberth-Sauer, N.J., Abrams, L. 2014. Resolving tip-of-the-tongue states with
syllable cues. In Torrens, V. and Escobar, L. (eds.), The processing of lexicon
and morphosyntax, 43-68. Newcastle, Cambridge Scholars Publishing.
Sauer, N.J. 2015. Das Tip-of-the-Tongue-Phänomen. Zur Rolle der Silbe beim
Auflösen von Wortfindungsstörungen. Doctoral dissertation, Frankfurt am
Main, Johann Wolfgang Goethe-Universität. doi: 10.13140/RG.2.1.1229.8645
Sauer, N. J. and Schade, U. 2016. Über die Entstehung und Auflösung von
Versprechern und Tip-of-the-Tongue-Zuständen. Manuscript in preparation.
An experimental study of English accent
perception
Elena Shamina
Department of Phonetics, Saint Petersburg State University, Russia
Abstract
The study aims at proving the observation that in English oral speech perception,
sociolinguistic evaluation prevails over personal one. The total of 10 speech samples
by 2 native English speakers with no special phonetic or acting training imitating
various English accents were evaluated by 26 native English speakers on a number
of scales related to sociolinguistic and personal factors. When listening to the same
persons speaking in different English language varieties the respondents ascribed to
them very different social qualities, such as social class, education and occupation.
The personality properties ascribed, such as character traits and age, are shown to
depend on the social factors, associated with the accent.
Key words: sociolinguistics, perception, English accents, social and personal
qualities
heard Speaker 1 imitating a foreign (French) accent they were more reticent
in their social judgment and tended to place him in the middle of the social
ladder (lower and upper middle class in 73% of the responses). They were
also rather at a loss when defining his professional qualifications and
mentioned, among others, such inconspicuous occupations as “traveler, poet,
teacher, tourist agent, student”.
Conclusion
The study data are consistent with the results of the earlier research into
sociolinguistic values of English accents. What it emphasizes is an
astonishing fact that in perceiving accented speech speakers of English
concentrate almost exclusively on the social factors, and evaluation of the
162 E. Shamina
Acknowledgements
The author would like to express sincere appreciation of Evgenia Sokolova’s
assistance in conducting the experiment.
References
Abramova, I.E. 2009. Phonetic variation outside the natural language environment.
Petrozavodsk, Petrozavodsk University Press. (In Russian)
Coupland, N. and Bishop, H. 2007. Ideologised values for British accents.. Journal
of Linguistics, vol.11, issue 1, 74-93.
Labov, W. 1966. The social stratification of English in New York City. Washington,
D.C., Center of Applied Linguistics.
Shamina, E.A. 2011. Subjective evaluation of the phonetic representation of some
national and regional varieties of the English language. In S. Androsova (ed.),
Proceedings of the 1st International Conference “Phonetics without Borders”,
96-98. Blagoveshchensk, Russian Federation.
Shamina, E.A. 2012. On objectivity of subjective evaluation of some national and
regional English accents. In L.A. Verbitskaya, N.K. Ivanova (ed.). Homo
speaking: XXI century research, 150 – 155. Ivanovo, Ivanovo State University
of Chemical Technology Press. (In Russian).
Wells, J.C. 1982. Accents of English, vol. 1 .Cambridge, Cambridge University
Press.
Phonetic words duration simulation using Deep
Neural Networks
Alexander Shipilo
Saint-Petersburg State University, Russia
Abstract
Deep Neural Networks (DNN) are widely used in speech prediction and speech
modeling. The current paper describes the implementation of DNN for the task of
duration prediction of speech units (allophones and syllables that form the structure
of phonetic word, intonation phrase). It is well-known that numerous factors
influence the duration of segments. However, the level of confidence of
characteristics differs significantly. It was found that deep neural network that
predicts allophones duration shows better results than the network that predicts the
duration of syllables.
Key words: deep NN, duration modeling, phonetic words.
Introduction
One of the challenging tasks in text-to-speech systems is the problem of
duration modeling of speech units. Despite recent research refers to the
problem of lengthening and shortening the speech units, unit selec-tion
systems demonstrate better naturalness (Lobanov, Tsirulnik, 2007, 2008).
The duration of speech segments varies significantly depending on the
position within intonation unit, phonetic word, the number of elements in the
speech unit (Svetozarova, 2014). Each allophone unit has its own intrinsic
duration value. It is known that a lot of factors influence the segment
duration.
Python Toolkit for Deep Learning (PDNN) was used in the current
research. The general architecture of the developed system is shown on the
fig. 1.
Material
The Corpus of Professionally Read Speech (CORPRES) was used in the
current research (Skrelin et al., 2010). During the pilot experiment the
recordings of one female speaker (approx. 6 hours of speech, 155591
allophones, 61591 syllables) were chosen. Each recording has following
manual checked annotation level:
T = ( D1 / N ) / (D2 / N ),
Experiments
Four experiments were performed. The first two deal with syllable duration
prediction (models 1, 2), others – with the allophones duration prediction
(models 3,4). Let us consider the experiment techniques.
Unfortunately, the prediction of real duration of a segment is a rather
difficult task. To simplify it, rounded values were predicted. Model 1
recognizes the percent deviation from the average of all syllables in the
material, model 3 – the percent deviation of the required allophone. For
example, let us consider the segment that is lengthened by 10 percent. This
value was rounded to the nearest possible percent deviation value. If the
required value is 110 %, the required coefficient is rounded to the nearest
Phonetic words duration simulation using Deep Neutral Networks 165
possible value accurate to 25 percent (e.g. 110 to 100 %, 120 to 125 % etc.).
Table 1 shows the features that were used in the model.
Models 3,4 predict the rounded number that is required to multiply by
the minimum level of auditory perception that is equal to 30 ms.
Each model consists of two hidden layers, each layer contains 2048
elements.
Table 2. Results.
Model Prediction accuracy, %
1 20
2 51
3 45
4 79
166 A. Shipilo
As we can see from the table 1, model 4 shows the best result, model 1 – the
worst. Models that simulate syllable durations show worse results, than
models that simulate the allophone ones. Models 2 and 4 show better results
(51 and 79 percents) in comparison to models 1 and 3. The reasons for it are
the fact that (1) the deviation depends on the duration of the average of the
target element, (2) the deviation is the relative characteristic. Let us consider
the average unstressed vowel allophone (for example /u/) equals 50 ms. In
this case ten percent lengthening means that the duration changes by 5 ms.
On the other hand, the ten percent change of the stressed allophone of
phoneme /a/ (the average duration in the material is 109 ms) means that the
duration changes by approximately 11 milliseconds. If we predict the real
allophone duration (models 2, 4), the problem of differences in averages
disappears.
The results confirm the hypothesis that the selected features can be used
as predictors of segment durations, but neural net provides no information
about the rate of confidence of the features. To answer this question
additional study is required.
References
Lobanov, B.M, Tsirulnik, L.I., Rules of Speech Corpus Segmentation into Phonetic
Units and the Strategy of Unit Selection in Speech Synthesis,
http://www.dialog-21.ru/digests/dialog2007/materials/html/60.htm
Lobanov, B.M, Tsirulnik, L.I., Computer Synthesis and Speech Cloning, Minsk,
2008 / in RussianMatusevich, M.I. 1976. Modern Russian Language. Phonetics
(in Russian) Sovremennij Russkij Yazik. Phonetika
Matusevich, M.I., Modern Russian Language. Phonetics, 1976 / in Russian/ Sovre-
mennij Russkij Yazik. Phonetika
Skrelin P., Kocharov D., Automatic processing of prosodic design of the utterance:
relevant prosodic features for automatic interpretation of intonation model,
2009, AP-2009, Saint-Petersburg / in Russian.
Skrelin, P., Volskaya, N., Kocharov, D., Glotova, O., Evdokimova V. CORPRES -
Corpus of Russian professionally read speech. In: Sojka, P., Horák A., Kopeček,
I., Pala, K. (eds.) TSD 2010. LNCS, vol. 6231, pp., 392-399. Springer,
Heidelberg (2010)
Svetozarova N.D., “Short” stressed vowels in the Russian language, Issues in Pho-
netics 6, 2014 / in Russian.
Transcription: what is meant by accuracy and
objectivity?
Pavel Skrelin, Nina Volskaya
Department of Phonetics, Saint Petersburg State University, Russia
Abstract
The paper deals with the relationship and discrepancy between phonetic (acoustic)
characteristics of the speech signal and their phonological interpretation with the aim
of their reflection in segmental transcription and prosodic annotation of the speech
corpora.
Key words: phonetics and phonology, transcription, speech corpora
Introduction
The presentation draws attention to the interaction between acoustic,
phonetic and phonological aspects of the speech signal and their reflection in
transcription. Accuracy of phonetic transcription plays an important role in
the annotation of speech corpora. The requirements for precision to a great
extent depend on the annotators' expertise and on what the corpus is
designed for. If the corpus is to be used in TTS or ASR applications the
selected phonetic signs must be as close as possible to acoustic (spectral)
features of sounds analyzed in their physical boundaries. The traditional
"manual" segmental transcription is based on perception of a word or at least
a syllable and represents a human model of speech perception and sound
interpretation. As a result transcriptions using different methods and aimed
at different applications may differ. At the same time comparison of the
results of both transcription types dealt with in the presentation provides
information about speech perception mechanisms on the segmental (phonetic
representation of distinctive features) and suprasegmental levels
(discrepancy between acoustic and perceived forms of melodic patterns).
Figure 3. Schematic representation of the IC6 and IC3with nuclear syllable in the
final position.
170 P. Skrelin, N. Volskraya
Conclusion
In real speech situation the distictive features cruicial for the phonological
decision-taking may not be present in the sound itself (which may be absent
altogether) but reflected in its right or / and left neighbours. This poses the
problem of formal represenation of the sound stream itself in automatic
interpretation (recognition) which is based on acoustic parameters of
segments. A similar probleme exists in the interpretation of F0 curves. As
long as we do not exactly know how the speech signal characteristics which
a person uses for phonological interpretation correlalte with its objective
evidence we need to use two ways of formal representation (transcription):
objective and abstract.
Notes
For speakers of some other languages but Russian (German, English, Finnish) this
contour shape is interpreted as falling. In English intonation system, for example, it
belongs to the phonologically falling compex rising-falling tone, the Jackknife (
O'Connor&Arnold, 1973).
References
Bryzgunova E. A. 1980 Intonation [intonacija], in: Russian Grammar, N. Shvedova,
Ed. Moscow: Nauka, vol. 2, pp. 96 – 122.
Kachkovskaia T., Kocharov D., Skrelin P., Volskaya N. 2016. CoRuSS - a new
prosodically annotated corpus of Russian spontaneous speech. in: Proceedings
of LREC 2016.
O'Connor J.D., Arnold G.F. 1973 Intonation of Colloquial English. Longman,
London.
Skrelin P., Volskaya N., Kocharov D., Evgrafova K., Glotova O., Evdokimova V.
2010. CORPRES - Corpus of Russian Professionally Read Speech. in: Text,
Speech and Dialogue, ser. Lecture Notes in Computer Science, P. Sojka, A.
Hor´ak, I. Kopecek, and K. Pala, Eds. Springer Berlin Heidelberg, 2010, no.
6231, pp. 392–399.
Grammatical change and hindcast model
statistics – A comparison between Medieval
French and Brazilian Portuguese
Eduardo Correa Soares
Université Paris Diderot, CLILLAC-ARP EA 3967, Paris, France
Abstract
This paper presents a methodology to analyse the ongoing linguistic change in
Brazilian Portuguese[BP] as regards the pro-drop parameter. I propose to apply a
hindcast statistical regression model to a sample of data from Medieval French[MF],
whose outcome is the obligatory subject use in Modern French and to compare to a
sample from BP. The results suggest that the change in such languages contrasts and
is related to different reasons. While the change in MF appears to have uniformly
gone toward non-pro-drop parameter, the BP change seems a by-product of semantic
preference of null subjects to corefer to non-animated and non-specific antecedents.
Key words: Hindcast statistical model, grammatical change, pro-drop parameter,
Brazilian Portuguese, Medieval French.
Introduction
This paper proposes a new methodology to address the grammatical change
regarding the pro-drop parameter in Brazilian Portuguese[BP]. I propose that
statistical hindcast regression model comparing Medieval French[MF] and
BP may verify whether some assumptions about BP are akin to what came
ou in MF. This model is applied to two samples of data from MF and BP.
The results show seemingly diverging patterns of linguistic change.
BP is taken to be a language on the way to become non-pro-drop
(Tarallo 1983, Galves 1987, 1992, 1998, Duarte 1993, 1995, inter alia). In
many standard pro-drop contexts in other Romance languages (for instance,
European Portuguese, Spanish and Italian), an overt pronoun is indeed
obligatory in nowadays colloquial spoken BP (see Duarte 1995, Barbosa,
Duarte & Kato 2005, inter alia), such as in the example in (1) below.
(1) então a gente lê pra ele 1 sentado ali... *(ele1) gosta...
So the people read.pres.3s for him seated there he like.pres.3s
“So there we read for him1 when seated down and he1 likes that.”
(NURC-RJ, inquiry_011, data_set: “70s”)
Such contexts and data lead many works to suggest that BP is changing
due to the simplification of agreement marking system, the so called
Taraldsen's generalization (Roberts 1993, 2014, Kato 1999, 2000, inter alia).
In this vein, it has been proposed that BP is following the same path by
which French has passed from the MF to Modern French (notable exceptions
to this claim are Kaiser 2009 and Roberts 2014). In MF, overt and null
pronouns have been in apparent free variation, as in (2) below.
(2) Aucassins1 s' en est tornés / (...) Vers le palais _1 est alés / il1 en monta les degrés
/ une canbre _1 est entrés / si _1 comença # a plorer
“Aucassin1 departed/ to the palace he1 went / he1 went upstairs / (into) a bedroom
he1 entered / this way he1 began to weep”
(SRCMF, aucassin, data_set: “XII_century”)
In the next section, I propose a hindcast model statistics, by applying
inferential logistic regression to data from MF and from BP.
Methodology
I propose to use a hindcast model to compare the change regarding the pro-
drop parameter in BP and French. This methodology consists of (i) analysing
a set of data from a specific period of time whose outcome is already known;
(ii) statistically describing what has taken place and testing for some
parameters and (iii) predicting possible similarities and differences from
another set of data by changing or adding one or more parameters.
I have analysed MF change (Adams 1987) whose outcome has been the
non-pro-drop status of modern French. I have compared this hindcast
analysis of MF data to BP data in order to evaluate the status of the current
so-called “on-going” change in BP. I have taken 9 texts from the historical
corpus of MF SRCMF1, 6 interviews of BP NURC-RJ corpus (3 carried out
in the 70s and 3 in the 90s)2 and 3 movie subtitles produced after 2010 from
the OPUScorpora project3. These texts were automatically annotated. The
sample was gathered by a concordance toolkit. The MF subcorpus was thus
constituted of 1500 sentences (a half of them without subject), distributed
into 3 subsets of data according to the year of the text (group1, the IXth and
Xth centuries; group2, the XIth and XIIth centuries; and group3, from the
XIIIth century on). The BP corpus was equally formed by 1500 sentences
(50% of subjectless sentences) and split into 3 subsets: group1, data from
70s; group2, from the 90s; and group3, data from 2010 on. The collected
data was then analysed with a Generalized Linear Model using the software
R, with the packages lme4, languageR and stats.
Results
Table 1 sums up the logistic regression analysis and the results obtained. In
French, the so-called impoverishment of agreement marking has
predominantly affected singular forms and 3rd person plural. The fixation of
non-pro-drop in MF is taken to be a strong effect of such an impoverishment
Grammatical change and hindcast model statistics 173
Discussion
This pilot corpus study suggests that the null subjects in BP are becoming
scarcer in a way different from MF. Firstly, the null subject in BP is
crucially likely to be 3rd person singular. This person is the less marked form
in BP (Kato 1999). In MF, no significant difference concerning person,
animacy or specificity was found. The MF change can also be related to
other factors (e.g. the use of clitic subject pronouns). In BP such a difference
seems to be a semantic-functionally motivated by-product of two factors –
the semantic features animacy and specificity. This difference can be crucial
to shed light on the partial pro-drop status of BP (Biberauer et al. 2010) and
the non-pro-drop status of modern French (Adams 1987).
Notes
1. See http://srcmf.org and Prévost & Stein (2013) for more information.
2. This corpus is available online in http://www.letras.ufrj.br/nurc-rj/.
3. See http://opus.lingfil.uu.se/OpenSubtitles2016/ and Lison & Tiedmann (2016).
174 E.C. Soares
Acknowledgements
I am thankful to CAPES Foundation for providing me the financial support of this
research and to my supervisors Philip Miller, Barbara Hemforth and Sergio
Menuzzi, who give me to-the-point advice to carry out my projects.
References
Adams, M. 1987. From Old French to the Theory of Pro-drop. Natural Language and
Linguistic Theory 5: 1-32.
Barbosa, P., Duarte, M. E. L. & Kato, M. A.. 2005. Null subjects in European and
Brazilian Portuguese. Journal of Portuguese Linguistics 4. v. 2: 11-52.
Biberauer, T. Holmberg, A. Roberts I., Sheehan, M. 2010. Parametric Variation.
CUP.
Cyrino, S. M.L.; Duarte, M.E. L., Kato, M. A. Visible subjects and invisible clitics in
Brazilian Portuguese. In: Kato, M.A. & Negrão, E.V. (eds.). 2000. 55-104.
Duarte, M. E. L. 1993. Do pronome nulo ao pronome pleno. In Roberts, I., Kato, M.A.
(eds.): 107-28.
Duarte, M.E.L. 1995. A Perda do Princípio "Evite pronome" no Português Brasileiro.
Campinas, SP, UNICAMP: Ph.D. Dissertation.
Galves, C. 1987. A sintaxe do português brasileiro. Ensaios de lingQistica13: 31-50.
Galvez, C. 1993. O enfraquecimento da concordância no Português Brasileiro. In:
Roberts & Kato (eds.): 387-408.
Galves, C. 1998. Tópicos e sujeitos, pronomes e concordância no português do
Brasil. Cadernos de Estudos Lingüísticos, 34: 19-32.
Kaiser, G. A. 2009. Losing the null subject. A contrastive study of (Brazilian)
Portuguese and (Medieval) French. In Proceedings of the Workshop Null-subjects,
expletives, and locatives in Romance: 131–156.
Kato, M. A. 1999. Strong pronouns, weak pronominals and the null subject parameter.
PROBUS, 11, 1: 1-37.
Kato, M. A. 2000 The partial pro-drop nature and the restricted VS order in Brazilian
Portuguese. In: Kato, M. A. & Negrão, E. V. (eds). 2000. 223-258.
Kato, M. A. & Negrão, E. V. (eds). 2000. The Null Subject Parameter in Brazilian
Portuguese. Frankfurt-Madrid: Vervuert-IberoAmericana.
Lison, P., Tiedemann, J. 2016. OpenSubtitles2016: Extracting Large Parallel
Corpora from Movie and TV Subtitles. In Proceedings of the 10th International
Conference on Language Resources and Evaluation (LREC 2016)
Prévost, S.; Stein, A. 2013. Syntactic Reference Corpus of Medieval French
(SRCMF). ENS de Lyon/ILR Stuttgart.
Roberts, I. 2014. Taraldsen’s Generalization and Language Change. Prepublished ms.
Roberts, I. & Kato, M. A.(eds.) 1993. Português Brasileiro: uma viagem diacrônica
(Homenagem a Fernando Tarallo). Campinas, SP: Editora da UNICAMP.
Tarallo, F. 1983. Relativization Strategies in Brazilian Portuguese. University of
Pennsylvania: Ph.D. Dissertation.
The phonetics of Russian North Bylinas
Svetlana Tananaiko1, Marina Agafonova2
1
Department of Phonetics, Saint-Petersburg State University, Russia
2
Department of Phonetics, Saint-Petersburg State University, Russia
Abstract
The Internet site presenting the bylinas of Russian North from Sound Records
Archives of Institute of Russian Literature was created in 2014. The aim of the site
is to give free access to the unique Russian folklore sound records, made throughout
XX century, for everybody interested, especially those who study anthropology,
folklore, dialects and dialect phonetics of Russian, because on this site the presented
sound fragments are analyzed in all these aspects. The article describes revealed
phonetic characteristics of North bylinas and suggests a theoretical interpretation of
the dynamics of the dialect phonetics changes.
Key words: dialect phonetics; Northern Russian dialect zone; Russian North bylinas
on the other hand, by the absence of /š’:/ and the presence of only one
affricate instead of two (Avanesov 1949).
Phonetic realization of vowels and consonants, even those
common with Russian Standard, is in these dialects different from the
Standard. The vowels are diphthongs or diphthongoids, the
palatalized sibilants are lisping and so on (Meshchersky 1972).
The rules of phoneme distribution and the rules of alternations
are also different from the Standard. For example, the unstressed
vocalism retains unstressed /o/ and /e/, in the consonant system
there are consonance simplifications (/mm/ instead of /bm/, /s’/
instead of /s’t’/) (Kolesov 2006).
References
Avanesov R. 1949. Essays of Russian Dialectology. Moscow.
Corpus of Russian Folklore. Bylinas. Sound Analogue: Internet site.
URL: http://www.zvukbyliny.pushkinskijdom.ru/.
Kolesov V. et al. 2006. Russian Dialectology. Moscow.
Meshchersky M. (ed.) 1972. Russian Dialectology. Moscow.
Tananaiko S. 2001. Russian Dialects in Non-Slavonic Surrounding. In Verbitskaya
L., Vasilkova V., Kozlovsky V., Skvortsov N. (eds.), Comparative Collection:
Miscellany of Sociological and Humanitarian Studies, 173-185, Saint-
Petersburg.
Association experiment in practice of linguistic
and cultural dominants research
Svetlana Takhtarova1, Diana Sabirova2
1
Dept of Theory and Practice of Translation, Kazan Federal University, Russia
2
Dept of European Languages and Cultures, Kazan Federal University, Russia
Abstract
The paper is devoted to experimental definition of the changes happening in
structure of cultural dominants of the German ethnosociety on the example of a
linguistic and cultural concept of Ordnung. To provide well-grounded conclusions
on the status of the problem and determine the axiological characteristics of the
concept the authors carried out an associative experiment. The respondents were
asked to write several words to the given words incentives. The experiment confirms
that cultural constants are dynamic formations which bound to change. The changes
characteristic of Ordnung as a cultural dominant inevitably involve modification of
the German communicative style that is shown, in particular, in greater tolerance to
deviations from norms and standards, smaller degree of criticality and
straightforwardness.
Key words: associative experiment, concept, cultural dominants.
Introduction
Cultural concepts, representing the most important category of cultural
linguistics, are actively studied as exemplified in the material of different
languages and cultures. The main characteristic of linguocultural concept is,
as it is well known, its value component (Karasik 2004). The culture
dominants, most important concepts for a given culture, constitute the core
value of worldview peculiar to a specific culture.
The Ordnung concept, which is the subject of this article, traditionally
considered as one of the key cultural landmarks of the German ethnosociety
(Bartminsky 2005, Medvedev 2007, Ter-Minasova 2007, Markowsky 1995,
Matussek 2006). Vezhbitska notes that Germans should have Ordnung
(order) and live in a world where Ordnung “reigns”. In fact, only
Ordnungcan guarantee their inner peace (Wierzbicka 1999). According to
Bauzinger untranslatability of German words Ordnungsamt,
Ordnungswidrigkeit, Ordnungsstrafe, ordnungspolitische Massnahme
proves the order concept to be of idioethnic character in German society. In
this context the order is not only a social principle, limiting every single
person to a particular behavioral pattern or framework, but also a norm,
which every person adheres to without any coercion (Bausinger 2002). At
the same time, the cultural dominants, despite its rigidity, can change over
time, similarly to the way the culture and the society evolve.
Results
The conducted experiment has allowed to define the following features of
Ordnung concept.
Firstly, most responses given by elderly people, i.e. third age group,
constituting associations they have given to the word-stimulus represent
axiomatic phrases and clichés: Ordnung muss sein (31%) and Ordnungist
das halbe Leben (26%). It is indicative, in our view, that such phrases appear
only sporadically in the responses of informants representing the first and the
second group.
Secondly, such verbal responses aswichtig, notwendig, sehrwichtig,
sehrpositiv were given by the representative of the third group, thus,
confirming normative-evaluative nature of the analyzed concept. The
responses of the informants comprising the first and the second group are
way less "axiological" - 4% and 12%, respectively. Moreover, verbal
responses submitted by youth group respondents reflect not only the positive
but also the negative perception of the stimulus-word: einschränkend,
überschätzt, bremst Kreation, Druck, nichtimmer. In general, negative
associations are insignificant (16%), but their presence in the responses of
young respondents is, in our opinion, of symptomatic character.
Thirdly, many informants of the youth group associate Ordnung with
purity and establishing order, which is evidenced by the following, rather
frequent responses: Sauberkeit, Sauber, Aufräumen, Zimmer. Similar words
Linguistic and cultural dominants research 181
are given by the representatives of the second age group, although much less
frequently. For the older generation the order is associated primarily with the
“mental” order and structured and well-organized life: Gedanken, Sicherheit
im Leben. Confirmation that is The fact that in many questionnaires
informants of this group provided not only single words as responses to the
word-stimulus, but detailed answers confirm the idea that Ordnungis
perceived by the oldest age group as an immutable value.
Ich liebe sie, weil sie das eigene Leben und das der anderen erleichtert; sie
sollte anzustreben sein, um besser zuleben; notwendig, um in eigener
Umweltbestehen zu können.
Fourthly, the associations of the youth group have been more varied and
diverse in terms of semantics. Thus, in particular, the responses of this group
contain following words, which are absent in the response given by the other
two groups of informants: Hierarchie, Gleichmäßigkeit, Planung,
Organisation, Recht, Organisiertheit, Struktur, Kalender, Eltern. The last
word-response is probably due to the fact that the order is instilled by parents
and children education begins, first of all, with meeting their own room
cleanness requirements. Thereby, the associations are closely connected with
the above-named frequent responses given by the representatives of the
group, denoting the cleaning and order. Connection with the cleaning
procedure is peculiar to responses of the informants from the second group,
evidenced by the following associations: Putzfrau, Schreibtisch, Zimmer,
Schrank.
It is noteworthy that unlike antonymy synonymy is not relevant element
in the responses of all three groups of respondents. Antonymous verbal
responses like Chaos, Unordnung were registered only sporadically.
Thus, the concept of Ordnung, remaining the culture dominant is
undergoing some changes in its content and value components. In particular,
it can be argued that for the younger generation, this concept has a more
utilitarian, practical significance. Associations given by the representatives
of student-youth groups have far fewer positive words, which indicates a
change in the axiological component of the analyzed concept. Proof of this
are the results of the axiological survey, which are, in our opinion, very
significant in this respect. In particular, it was found that for the vast
majority of informants of the oldest group Ordnung concept has positive
connotation - 98% of respondents demonstrated their positive attitude
towards this concept.
Answers of the second group are not so unambiguous - 56% defined
their attitude to the order as positive and 44% as neutral.
Attitude of informants from the youth group to the Ordnung concept
proved to be most ambivalent: positive attitude to the order shown by 48%
182 S. Takhtaroval, D. Sabirova
Conclusions
The conducted experiment allows for a conclusion that cultural constants
represent dynamic formations, content of which may change reflecting
alterations in the systems of values specific to a particular ethnosociety. In
this context, the study dedicated to the study of the value component of
lingocultural concepts is of particular importance, as the results of such
studies are relevant for establishing and sustaining effective cross-cultural
communication.
References
Karasik, V.I. 2004. Language Circle: Personality, Concepts, Discourse. Moscow,
Gnozis.
Bartmin'skiy, Ye. 2005. Language imageofthe World: Essays on Ethnolinguistics.
Moscow, Indrik.
Medvedeva, T.S. 2007. Representation of Ordnungconcept in German linguistic
picture of the world. Herald of Udmurskiy University. Philology, 5(2).
Ter-Minasova, S.G. 2007. War and Peace of Languages and Cultures: Theory and
Pracitice. Moscow, Astrel': Khranitel'.
Markowsky, R. 1995. Studienhalber in Deutschland: interkulturelles
Orientierungstraining für amerikanische Studenten, Schüler und Praktikanten.
Heidelberg, Asanger.
Matussek, M. 2006. Wir Deutschen. Warum uns die anderen gern haben können. –
Frankfurt /Main, S. Fischer Verlag
Wierzbicka, A. 1999. Semantic Universals and Language Description. Moscow,
Yazyki Russkoy Kul'tury.
Bausinger, H. 2002. Typisch deutsch. Wie deutsch sind die Deutschen? München,
Beck HG - Verlag.
Filled pauses and lengthenings detection using
machine learning techniques
Vasilisa Verkhodanova, Vladimir Shapranov, Alexey Karpov
SPIIRAS, Saint Petersburg, Russia
Abstract
This paper addresses the issue of filled pauses and lengthenings detection and
classification in Russian using machine learning techniques, such as ELM. We use
such parameters as formants and energy variation and MFCC coefficients. The
experiments on FPs detection and classification, that are carried out on the joint
material of SPIIRAS task-based dialogs corpus, Russian casual conversations from
Binghamton Open Source MultiLanguage Audio Database, reports from the
appendix No5 to the phonetic journal “Bulletin of the Phonetic Fund” belonging to
the Department of Phonetics of Saint Petersburg University and small part of
SWITCHBOARD corpus. For evaluation of the experiments results we calculate the
F1 score. The best achieved F1 score was 0.42.
Key words: speech disfluencies, filled pauses, spontaneous speech processing,
Russian, ELM
Introduction
The need of detecting speech disfluencies automatically emerged mainly
from the problems of automatic speech recognition (ASR): disfluencies are
known to have an impact on ASR results, they can occur at any point of
spontaneous speech, thus they can lead to misrecognition or incorrect
classification of adjacent words. Since the INTERSPEECH 2013
Computational Paralinguistics Challenge (ComParE) (ComParE, 2013)
appeared a lot of works on detection of fillers using the different machine
learning approaches, since ComParE raised interest in automatic detection of
fillers providing a standardised corpus and a reference system.
In (Medeiros et al., 2013) authors focused on detection of filled pauses
basing on acoustic and prosodic features as well as on some lexical features.
Experiments were carried on a speech corpus of university lectures in
European Portuguese Lectra. Several machine learning methods have been
applied, and the best results were achieved using Classification and
Regression Trees: for detecting words inside of disfluent sequences
performance was about 91% precision and 37% recall, when filled pauses
and fragments were used as a feature, without it, the performance decayed to
66% precision and 20% recall. In (Prylipko et al., 2014) authors presented a
method for filled pauses detection using an SVM classifier, applying a
Gaussian filter to infer temporal context information and performing a
morphological opening to filter false alarms. For the feature set authors used
the same as was proposed for ComParE (ComParE, 2013), extracted with the
openSMILE toolkit (Eyben et al, 2010). Experiments were carried out on the
LAST MINUTE corpus of naturalistic multimodal recordings of 133
German speaking subjects in a so called Wizard-of-Oz (WoZ) experiment.
The obtained results were recall of 70%, precision of 55%, and AUC of 0.94.
Though evidence on filled pauses and lengthenings (further jointly
referred as FPs) differs across languages, genres, and speakers, on average
there are several disfluencies per 100 syllables, filled pauses being the most
frequent disfluency type (O’Connell et al., 2004). In Russian speech filled
pauses and lengthenings (jointly referred as FPs later on) occur at a rate of
about 4 times per 100 words, they also occur at approximately the same rate
inside clauses and at the discourse boundaries (Kibrik et al., 2014). In this
paper we present the results of machine learning experiments on detection of
FPs on the mixed and quality diverse corpus of Russian spontaneous speech
with a addition of 20 minutes from SWITCHBOARD (Godfrey et al, 1992).
Corpus
The corpus we use for the experiments comprises various material. There are
dialogs collected in St. Petersburg in the end of 2012 - beginning of 2013
(Verkhodanova et al., 2014). This part consists of 18 dialogs from 1.5 to 5
minutes, where people in pairs fulfilled map and appointment tasks.
Participants were students: 6 women and 6 men from 17 to 23 years old with
technical and humanitarian specialization. Recordings were annotated
manually into different types of disfluencies, the FPs being the majority -
492 phenomena (222 filled pauses and 270 lengthenings). There are also
recordings from Multi-Language Audio Database (Zahorian et al., 2011),
that consists of approximately 30 hours of sometimes low quality, varied and
noisy speech in each of three languages, English, Mandarin Chinese, and
Russian taken from open source public web sites, such as
http://youtube.com. From the Russian part we have taken the random 6
recordings of casual conversations (3 female speakers and 3 male speakers)
that were manually annotated into FPs (284 FPs:188 filled pauses and 96
sound lengthenings). There are also12 recorded scientific reports (linguistics,
logic, psychology, etc) from the appendix No5 to the phonetic journal
“Bulletin of the Phonetic Fund” belonging to the Department of Phonetics of
Saint Petersburg University (Dep. of Phonetics). They were all recorded in
70s-80s in Moscow except one that was recorded in Prague. All speakers (6
men and 6 women) were native Russian speakers. The number of manually
annotated FPs is 285 (225 filled pauses and 60 lengthenings). Another part
we added for making our corpus more quality diverse is the records from the
SWITCHBOARD corpus (Godfrey et al., 1992): 3 telephone dialogues,
Filled pauses and lengthenings detection using machine techniques 185
Conclusion
In this paper we presented experiments on detection of filled pauses and
lengthenings using acoustic-only features for machine learning classification
(Extreme Learning Machines). For the experiments we used diverse material
differing in quality, recording sites and situations. The feature set consisted
of 21 standard deviations (for F0 and first three formants, energy, voicing
probability and its derivative, 14 MFCC coefficients), and of 3 mean values
186 V. Verkhodanova, V. Shapranov, A. Karpov
(for energy, voicing probability and its derivative). As the result we achieved
F1 score of 0.42.
Acknowledgements
This research is supported by the grant of Russian Foundation for Basic Research
(project No 15-06-04465).
References
Akusok, A., Bjork, K. M., Miche, Y., Lendasse, A. 2015. High-performance
extreme learning machines: a complete toolbox for big data applications.
Access, IEEE, 3, 1011-1025.
ComParE INTERSPEECH: Computational Paralinguistic Challenge, 2013.
http://emotion-research.net/sigs/speech-sig/is13-compare
Department of Phonetics of Saint Petersburg University. http://phonetics.spbu.ru/
Prylipko, D., Egorow, O., Siegert, I., Wendemuth, A. 2014. Application of Image
Processing Methods to Filled Pauses Detection from Spontaneous Speech. In
Proc. of INTERSPEECH 2014, 1816-1820, Singapore.
Eyben, F., Wollmer, M., Schuller, B. 2010. OpenSMILE: the Munich Versatile and
Fast Open-Source Audio Feature Extractor. In Proc. 18th ACM International
conference on Multimedia, 1459-1462.
O'Connell, D., Kowal, S. 2004. The History of Research on the Filled Pause as
Evidenceof the Written Language Bias in Linguistics. Journal of
Psycholinguistic Research, vol. 33(6), 459-474.
Kibrik, A., Podlesskaya, V. (eds.). 2014. Rasskazy o Snovideniyah: Korpusnoye
Issledovaniye Ustnogo Russkogo Diskursa [Night dream stories: Corpus study
of Russian discourse], Litres.
Godfrey, J.J., Holliman, E.C., McDaniel, J. 1992. SWITCHBOARD: Telephone
Speech Corpus for Research and Development. In Proc. of International
Conference on Acoustics, Speech, and Signal Processing (ICASSP-92). vol. 1,
517-520.
Verkhodanova, V., Shapranov, V. 2014. Automatic Detection of Filled Pauses and
Lengthenings in the Spontaneous Russian Speech. In: Proc. 7th International
Conference Speech Prosody, 1110-1114, Dublin, Ireland.
Zahorian, S.A., Wu, J., Karnjanadecha, M., Vootkur, C.S., Wong, B., Hwang, A.,
Tokhtamyshev, E. 2011. Open-Source Multi-Language Audio Database for
Spoken Language Processing Applications. In Proc. INTERSPEECH 2011, pp.
1493-1496, Florence, Italy.
Boersma P., Weenink D. 2016. Praat: doing phonetics by computer [Computer
program]. Version 6.0.11, retrieved 20 January 2016 from http://www.praat.org/
Psycholinguistic evidence for the composite group
Irene Vogel, Angeliki Athanasopoulou
Department of Linguistics and Cognitive Science, University of Delaware, USA
Abstract
It is widely accepted that speech is phonologically structured in terms of
phonological constituents composing a Prosodic Hierarchy (PH). There is less
consensus, however, regarding the constituents themselves. We focus here on the
controversy surrounding a prosodic constituent between the Phonological Word and
the Phonological Phrase, the Clitic Group in (Nespor and Vogel 1986/2007). While
in some analyses it has been excluded, elsewhere it has been replaced by a revised
Composite Group(κ) (Vogel 2009). Here we present psycholiguistic data from
language acquisition and adult speech production that support the existence of κ
across languages.
Key Words: language acquisition, speech encoding, phonological word, composite
group
Introduction
The Composite Group (κ), which has replaced the Clitic Group, is the most
controversial constituent in the PH, and in fact, it is often excluded. The κ
consists of a Phonological Word (ω) and certain affixes and/orfunction
words, and possibly additional ωs in the case of compounds. It thus provides
a constituent between the Phonological Word (ω) and Phonological Phrase
(φ) which may serve as the domain of phonological phenomena across
languages. The data presented below provide independent support for the κ
based on two types of psycholinguistic studies, language acquisition and
language processing. We first discuss the acquisition of prosody in English
and Greek, and then speech processing studies in Dutch, Italian, Romanian,
and Nepali.
greenhouse vs. green house) are not fully acquired until the age of 11years
or later (Athanasopoulou 2016, Shilling 2010, Vogel & Raimy
2002).Interestingly, the production of phrasal stress is mastered after
compound stress.Thus, we can place the acquisition of compound stress
between that of the ω and φ, providing support for an intermediateκ
constituent. The acquisition order is thus as predicted:ωκφ.
The acquisition of Greek compound(ω), clitic (κ), and phrasal stress (φ)
further supports the presence of κ in the PH.Stress in compounds (e.g.,
kokinomális “redhead”) is acquired first, at the age of 6 (Athanasopoulou
2016) and possibly earlier (Tzakosta & Manola 2012) whilephrasal stress
(e.g., kókinamaliá “red hair”) is acquired last (Athanasopoulou 2016). Clitic
stress (e.g., kípeló tis “her cup”; compare with kípelo “cup”) appears as early
as 2 years (Tzakosta 2004), but it is not fully acquired until later, crucially,
after compound stress and prior to phrasal stress (Athanasopoulou 2016).
This three-step acquisition sequence provides further support for the κ
constituent and matches the one we saw in English: ωκφ.
Table 1 summarizes the findings regarding the order of acquisition of the
different prosodic patterns in English and Greek. The results support the
claim that prosodic development follows the PH and crucially, they show
that the presence of theκ between the ω and φ is necessary to account for the
order of acquisition of these prosodic phenomena.
Overall, we see the same pattern: compounds behave like single words
while clitics do not significantly increase the encoding time. One account for
this pattern is to reassess the structure of compounds as a single (recursive)
ω’ despite their internal composition with two ωs (Wheeldon and Lahiri
1997, 2002).This would not only alter the definition of prosodic constituents,
but it would also obscure structural and other phonological distinctions,
resulting in serious drawbacks (Vogel 2009). On the other hand, if the κ is
included in the PH, the results can be simply accounted for avoiding these
drawbacks: it is the number of κs, not ωs, hat determines the encoding time.
As we can see in Table 2, this account yields the correct predictions for all
the structures, since the κ could have one ω (e.g., clitic structures) or two ωs
(e.g., compounds). Overall, we see that having a constituent between ω and
φ explains better the encoding time patterns across languages.
Conclusions
In the present paper, we synthesized the findings from several studies in
language acquisition and speech processing to assess the psychological
reality of the controversial κ constituent in PH. The results from both groups
of studies demonstrate that the observed behaviors are best accounted for if
an intermediate constituentκis included in the PH between ω andφ. Thus,
while there is theoretical controversy regarding the κ, psycholinguistic
190 I. Vogel, A. Athanasopoulou
References
Athanasopoulou, A. 2016. Prosodic development in Greek and English. University
of Delaware: Doctoral dissertation.
Demuth, K. and Fee, J. 1995. Minimal Words in early phonological development.
Brown University & Dalhousie University.
Kehoe, M., Stoel-Gammon, C., and Buder, E. 1995. Acoustic correlates of stress in
young children's speech. Journal of Speech and Hearing Research 38,2, 338-
350.
Koirala, C. 2012. The composite group as the units of speech production in Nepali.
Talk presented at the 33rd Annual Conf. of the Ling. Society of Nepal.
Levelt, W. 1989. Speaking: from intention to articulation.Cambridge, MA:MIT
Press.
Nespor, M. and Vogel, I. (1986/2007). Prosodic phonology. Dordrecht: Foris.
Shilling, H. 2010. Compound and phrasal stress acquisition: When a greenhouse
becomes different to a green house. University of Birmingham: MA dissertation.
Tzakosta, M. 2004. The acquisition of the clitic group in Greek. Proc. of the 24th
Annual Meeting of Greek Linguistics, 693-704. Thessaloniki, Greece: Faculty of
Philosophy, Aristotle University of Thessaloniki.
Tzakosta, M. and Manola, D. 2012. Perception and production of compounds by
preschool children: pedagogical consequences. In Malafantis et al. (eds.),Proc.of
the 7th Intern. Conf. of the Greek Pedagogical Soc. – Greek Pedagogy and
Educ. Research, vol. 2, 1119-30. Athens: Diadrasi.
Vigário, M. 2011. Prosodic structure between the prosodic word and the
phonological phrase: recursive nodes or an independent domain? The Linguistic
Review 27, 4, 485-530.
Vogel, I. 2009. The Status of the Clitic Group. In Grijzenhout, J. and Kabak, B.
(eds.),Phonological Domains: Universals and Deviations, 15-46. Berlin:
Mouton de Gruyter.
Vogel, I. and Raimy, E. 2002. The Acquisition of Compound vs. Phrasal Stress in
English. Journal of Child Language 29, 2, 225-50.
Vogel, I. and Spinu, L. 2009. The domain of palatalization in Romanian. In Masullo,
P., O’Rourke, E., and Huang, C. (eds.), Selected Papers from LSRL 37, 307-20.
Philadelphia: John Benjamins.
Vogel, I. and Wheeldon, L. 2010. Units of speech production in Italian. In Colina, S.,
Olarrea, A., and Carvalho, A. (eds.), Romance Linguistics 2009,95-110.
Philadelphia: John Benjamins.
Wheeldon, L.and Lahiri, A. 1997. Prosodic Units in Speech Production. Journal of
Memory and Language 37, 356-81.
Wheeldon, L. and Lahiri, A. 2002. The minimal unit of phonological encoding:
prosodic or lexical word. Cognition 85, B31-B4
Index of names