Neutralization of Voicing Distinction of Touhoku Dialect

Download as pdf or txt
Download as pdf or txt
You are on page 1of 5

Neutralization of voicing distinction of stops in Tohoku dialects of

Japanese: Field work and acoustic measurements


Ai Mizoguchi1,2, Ayako Hashimoto3, Sanae Matsui4, Setsuko Imatomi5,
Ryunosuke Kobayashi4, and Mafuyu Kitahara4
1
Maebashi Institute of Technology, Gunma, Japan
2
National Institute for Japanese Language and Linguistics, Tokyo, Japan
3
Tokyo Kasei-gakuin College, Tokyo, Japan
4
Sophia University, Tokyo, Japan
5
Mejiro University, Tokyo, Japan
[email protected], [email protected],
[email protected], [email protected],
[email protected], [email protected]

merged in the southern Tohoku region. That is, /ito/ ‘string’ is


Abstract pronounced as [ido] and /ido/ ‘well’ as [ido]. On the contrary,
Research on Tohoku dialects, which is a variety of Japanese, such a merging is avoided in northern Tohoku dialects because
has found that the voiceless stops /k/ and /t/ in the intervocalic the voiced obstruents /b/, /d/, and /z/ are pre-nasalized
position are frequently realized as voiced stops. However, the intervocalically as [mb], [nd], and [ndz], respectively. As for /ɡ/,
phenomenon has mainly been judged aurally in the Japanese a velar stop, it is fully nasalized as [ŋ]. Since /ito/ is pronounced
linguistics literature and has not been confirmed by acoustic as [ido], while /ido/ as [indo] in northern Tohoku dialects, the
measurements. We measured the VOT of data originally distinction is maintained.
collected in the survey of Tohoku dialects by [1]. The data used [2] surveyed the sounds of Tohoku dialects and suggested
in this study includes two age groups from eight sites. The the effects of adjacent vowels on intervocalic voicing. In terms
results demonstrate that for word medial stops, the VOT of acoustic measurements, VOT, which is a widely used
distribution of voiced and voiceless stops largely overlapped, acoustic metric of voicing [5–8], was investigated for the word-
while, the laryngeal contrast was maintained for the word initial initial stops in Tohoku dialects showing a bi-modal VOT
stops. Intervocalic voicing neutralization was confirmed by distribution for voiced and voiceless stops [9]. Regarding the
quantitative acoustic measurements. The effects of neighboring utterance position, the word-initial stops in an isolated utterance
vowels were also investigated to show that height, but not tend to have a longer VOT than the ones in a sentence in various
duration, had a significant effect on voicing neutralization. Our languages [5, 6, 10]. Word-medial stops tend to have a greater
results shed light on the phonetic nature of Tohoku dialects as voiced portion during the closure in American English than
well as on their phonological structure, such as the role of word-initial stops do [11]. However, few studies have
voicing contrast. investigated the VOT of word-medial stops in a language that
Index Terms: intervocalic voicing, neutralization, Tohoku features voicing neutralization.
dialects, VOT In this study, we measured the VOT of the data collected in
the recent survey by [1], which was conducted to investigate
present-day Tohoku dialects, and confirmed intervocalic
voicing neutralization using quantitative acoustic
1. Introduction measurements.
Tohoku dialects are spoken in the Tohoku district, the northern
part of Honshu (the mainland) in Japan. They show some salient 2. Method
characteristics in pronunciation that distinguish them from The speech data for the present study were originally collected
other dialects in Japan [2–4]. Tohoku dialects can be further for a project on the phonological descriptions of Tohoku
divided into two groups depending on their properties: northern dialects [1]. A brief overview of the recording in the project and
Tohoku dialects and southern Tohoku dialects. the description of the portion of data in the present study along
The most prominent characteristic of the consonants found with the acoustic and statistical analyses are given in this
in Tohoku dialects is that the voiceless stops /k/ and /t/ are section.
voiced intervocalically in both northern and southern Tohoku
dialects. For example, /atama/ ‘head’ is pronounced as [adama] 2.1. Recording sites
and /kaki/ ‘persimmon’ as [kaɡi]. On the contrary, word-initial
voiceless stops are not voiced. This fact suggests that voiceless The left panel in Figure 1 shows the map of the Tohoku district
stops are voiced intervocalically because they assimilate into in Japan. The recording sites in the survey by [1] are shown in
the voicing feature of neighboring vowels, which are the right panel. To capture the comprehensive characteristics of
fundamentally voiced, and that they are not voiced word- Tohoku dialects, distinct sites with sufficient distance from the
initially because there is no vowel before them. As a result of geographical, historical, and cultural points of view were
the intervocalic voicing of voiceless stops, certain words are chosen.
Table 2: Word list.
Initial Medial Both
tokei gitaa tomato
Aomori takigi kutsushita
・ daikon hata
Hirosaki ・ ・Hachinohe Coronal natto
mado
budoo
・Morioka
Akita ・ kutsu tsuki, suki kamakiri
・Kamaishi kutsushita fuki, yuki kaki
・Ichinoseki kisha takigi, azuki kiku
kusa oke, tokei
Mikawamachi
・ kuchibashi yakan, mikan
Tsuruoka ・
・Sendai kujira okashi, suika
gitaa shika
Velar chikarakobu
daikon, neko
・Aizu-wakamatsu baiku,
omikuji
tsukushi
hoshigaki
Figure 1 Left: Tohoku district in Japan. Right: Eleven nagagutsu
recording sites in the survey [1].
2.4. Acoustic measurements
2.2. Participants
The words were transcribed and segmented using Praat [12] by
Tohoku dialects were recorded from 2012 to 2016 in 11 sites trained phoneticians and checked by another phonetician in our
covering all the six prefectures in the Tohoku district. The total group. The target stops were visually identified by the clear
number of recorded speakers was 61, whose ages ranged from existence of a closure and the following burst. The start of the
10 to 92. In the present study, the data on 24 speakers from eight voicing was identified by the visible concentration of energy
sites in two age groups (69–85 years and 33–53 years) were below the 1kHz range. Only the segments with a clear burst and
analyzed. Table 1 summarizes the eight sites, age groups, and voicing were used to measure the VOT. In other words, those
number of speakers. without a visible burst or with a devoiced vowel were omitted
from further analyses. Vowels were identified by a visible
Table 1: Number of participants in each site. structure of F2 and F3, which did not necessarily coincide with
the start or end of the voicing. The duration and pitch of the
Site 69–85 33-53 surrounding vowels were also measured. Figure 2 shows an
years years example utterance in which the burst and voice onset for the
Aomori 0 2 word-initial stop were measured and those for the word-medial
Hachinohe 2 0 stop were not measured due to continuing voicing throughout
Akita 2 2 the closure.
Morioka 1 2
Kamaishi 2 0
Ichinoseki 2 1
Tsuruoka 2 2
Aizu-wakamatsu 2 2

2.3. Materials and procedures


The picture task, which involved 48 photos or illustrations
presented to participants sequentially, was conducted by one of
the authors. Participants were asked to pronounce the name of
the depicted object in their usual spoken language. The Figure 2: Examples of the measured burst (b) and
recording was done at town halls or the participant’s home voice onset (v) for [t] in /tokei/. The burst and voice
using a SONY ECM-MS957 microphone on a SONY PCM- onset were not measured for [k] due to continuing
D50 recorder (16bit, 44kHz). voicing throughout the closure (<cl>).
Words containing coronal or velar stops (/t/, /d/, /k/, /ɡ/)
were selected for acoustic analysis. Table 2 presents the set of 2.5. Statistical analyses
expected words shown to participants by photos or illustrations.
A linear mixed-effects model analysis was conducted in R [13],
Even when participants produced a different word from the one
using the lme4 [14] and lmerTest [15] packages. The models
expected, if it contained a coronal or velar stop, the words were
were selected using a step-down model building process and the
included in the analysis. As a result, the total number of
log-likelihood comparisons.
analyzed words was 84.
3. Results significant effect (β=-30.46, t=-3.35, p<.01). A pairwise post
hoc test comparing the estimated means showed that the VOT
3.1. VOT by segment position of the word-initial /t/ was significantly longer than that of the
word-medial /t/ (p<.001) and that the VOT of the word-medial
We measured and analyzed the voice onset time of the word- /t/ was not significantly different than that of the word-medial
initial and -medial stops /t, d, k, ɡ/ in 84 varieties of stimulus /d/ (p=.98). In addition, the VOT of the word-initial /k/ was
words uttered by 24 native speakers of Tohoku dialects. This significantly longer than that of the word-medial /k/ (p<.001).
resulted in 830 target segments. For the word-medial position,
Figures 4 and 5 show the case number normalized
the cases in which the voicing from the previous vowel was
histograms of the word-initial (Figure 4) and -medial (Figure 5)
continued throughout the closure were excluded from the
VOT of /t/, /d/, /k/, and /ɡ/. For the word-initial stops, the VOT
analysis due to the impossibility of identifying the voice onset
distribution looks bimodal with a slight overlap in the middle
for the target segment. Due to this measurement difficulty and
(Figure 4). By contrast, for the word-medial stops, there is no
the selection of the stimulus words, the numbers included in the
large variability of VOT across the voiced and voiceless stops
analysis became imbalanced between the segments and
(Figure 5).
segment positions (Table 3). Thus, the results, especially those
from the word-medial /ɡ/, should be considered as a reference.

Table 3: The number of target segments included in


the analysis, VOT and its SD, and the number of fully
pre-voiced segments in the word-medial position
excluded from the analysis.
Segment N VOT mean SD Fully pre-
(ms) voiced
Word-initial
/t/ 36 47.56 15.98 -
/d/ 23 14.22 9.00 -
/k/ 179 68.05 25.99 -
/ɡ/ 21 22.95 23.97 -
Word-medial
/t/ 117 19.27 9.34 3
/d/ 24 15.67 8.24 37
/k/ 426 37.47 19.10 59
/ɡ/ 4 26.00 4.08 8
Figure 4: Normalized word-initial /t, d, k, ɡ/ VOT.
Figure 3 shows the VOT of the word-initial and -medial /t, d, k,
ɡ/ uttered by the 24 native speakers of Tohoku dialects.

Figure 5: Normalized word-medial /t, d, k, ɡ/ VOT.


Figure 3: Word-initial and -medial /t, d, k, ɡ/ VOT.
3.2. Effects on the VOT of the word-medial /t/ and /k/
The selected mixed-effects linear regression model
The effects on the VOT of the word-medial /t/ and /k/ were
predicting the VOT from the fixed effects of the interactions of
further explored and a mixed-effect model was chosen, which
voicing contrast (voiced/voiceless), place of articulation
consisted of the fixed effects of the place of articulation,
(alveolar/velar), and segment position (initial/medial), and the
stimulus word duration, previous segment, and following
segment duration and the following segment, with random
segment, and a random intercept of speakers. Table 4 shows the
intercepts of speakers and stimulus words, revealed that the
interaction between voicing and place of articulation had a
significant effects of the previous and following segments on As for the following vowel, if it was [aː] (long /a/), the VOT
the VOT of the word-medial /t/ and /k/. was shorter than the baseline and if it was [i], the VOT was
longer than the baseline.
Table 4: Significant fixed effects in the mixed-effects VOT is affected by the phonological context [2, 16].
model on the VOT of the word-medial /t/ and /k/. According to [16], VOTs are longer before high vowels than
before mid- or low vowels. Indeed, if the vocal fold tension is
Predictor Estimate β (ms) t-value p-value
high as occurs with high vowels, voicing is difficult and thus,
previous [e] 16.00 3.48 <.001
the VOT becomes longer [17]. This explanation is mostly
previous [i] 8.68 2.99 <.01 compatible with our finding that when the following vowel was
previous [o] 9.17 1.99 <.05 a high vowel [i], the VOT was longer, whereas if it was a low
vowel [aː], the VOT was short. However, not all the high and
previous [u̥] -7.17 -2.28 <.05 low vowels used in the current experiment showed the same
following [aː] -10.58 -2.10 <.05 significant effects. It also needs to be examined further why the
following [i] 20.74 6.55 <.001 devoiced /u/ had a shorter VOT of the following stop. To
narrow the adjacent vowel effects, vowel duration and f0 were
added into the analyses. The results showed no effects of vowel
To narrow the adjacent vowel effect, the vowel durations duration on the VOT irrespective of the vowels. In American
were used as the fixed effects (vot ~ place + stimulus word English, the previous vowel duration is longer before voiced
duration + previous/following segment * previous/following consonants than voiceless consonants [18, 19]. The difference
segment duration + (1|speaker)). The models showed no between previous vowel duration in American English plays a
significant effects of adjacent vowel durations. role in distinguishing the following voiced/voiceless contrast.
The f0 values at the 3/4 time points for the previous vowel However, for the Tohoku dialects in the current study, although
and at the 1/4 time points for the following vowel were also the word-medial voicing contrast was often neutralized, the
used and the model (vot ~ place of articulation + stimulus word previous vowel duration was not relevant for the VOT. As
duration + previous/following vowel * previous/following described in the Introduction, word-medial voiced and
vowel f0 * speaker’s sex + previous/following vowel * voiceless stops are distinguished by the pre-nasalization of
previous/following vowel f0 * speakers’ sex + (1|speaker)) voiced stops in the northern variations of Tohoku dialects, but
showed that for male speakers, f0 had a significant effect on not in the southern variations.
VOT when the previous vowel was [e] (β=-.90, t=-2.07, p<.05) In terms of f0, although a few significant effects were seen,
and [i] (β=.70, t=2.83, p<.01). It also showed a significant effect the pitch accent, which is lexically assigned to each word in
on VOT when the following vowel was [e] (β=-.19, t=-2.49, Japanese, was not controlled for the stimulus words and this had
p<.05) and [oː] (β=-.77, t=-2.28, p<.05). a huge effect on vowels. Thus, it is not ideal to include this in
the interpretation at this point. However, in discussing the vocal
4. Discussion fold tension, it will be necessary to see the f0 effect on the VOT.
A closer look at vowel quality including vowel duration and
4.1. Intervocalic voicing neutralization f0 with more controlled stimuli may lead to a discussion on
whether voicing neutralization occurs due to phonological
Figure 3 and the linear mixed-effects analysis showed that the
and/or physiological factors.
VOTs of /t/ and /k/ were significantly shorter in the word-
medial position than in the word-initial position. The VOT of
the word-medial /t/ did not significantly differ than the word- 5. Conclusion
medial /d/. The comparison between the word-medial /k/ and The intervocalic voicing neutralization in Tohoku dialects is a
/ɡ/ was not reported here due to the small number of measurable widely known phenomenon. However, quantitative
cases for the word-medial /ɡ/, but 59 cases of the word-medial measurements and statistical analyses have long been lacking.
/k/ were produced with full pre-voicing (Table 3). In addition, In the future, it will be necessary to analyze more data from all
Figure 5 shows the less variability across the voiced and sites across the Tohoku district to investigate its sociolinguistic
voiceless stops and overlap of the VOT in the word-medial aspects. Analyzing more data will also provide better statistical
position. These results showed that the voicing contrast tended power by increasing the measurable target segments, especially
to disappear in the word-medial position in these dialects. This voiced ones. In addition, it will be necessary to closely examine
neutralization has long been acknowledged as characteristic of fully pre-voiced cases with a negative VOT, which could not be
Tohoku dialects in the Japanese linguistics literature [2–4], and incorporated into our analysis. Finally, because word-medial
the current results confirmed this well-known phenomenon voicing is diminishing among the younger generation [2], it is
using quantitative measurements. desirable to explore different age groups and the socio-
demographic aspects of these dialects.
4.2. Effects of adjacent vowels
Table 4 shows the effects of adjacent vowels on the VOT of the 6. Acknowledgements
word-medial voiceless stops. If the previous vowel was a high
vowel [i], or a mid-vowel [e] or [o], the VOT was predicted to We are truly grateful to the participants of the survey of Tohoku
be longer than the baseline mean value. If the previous vowel dialects. This research was supported by JSPS Kakenhi
was [u̥] (devoiced /u/), the VOT was predicted to be shorter than 24520438.
the baseline.
7. References
[1] A. Hashimoto, “Tohoku hougen ni okeru kagyou tagyou shi’in no
yuuseika ni tsuite: Tohoku hougen onsei chousa kara Ⅱ [On the
voicing of /k/ and /t/ in Tohoku dialects: A survey on sounds of
Tohoku dialects II],” Tsuda Journal of Language and Culture, 34,
pp. 88–102, 2019.
[2] J. Ohashi, Tohoku hougen onsei no kenkyuu [Research on sounds
of Tohoku dialects], Tokyo: Oufuu, 2002.
[3] M. Shibatani, The Languages of Japan. Cambridge: Cambridge
University Press, 1990.
[4] N. Tsujimura, An Introduction to Japanese Linguistics. Oxford:
Blackwell Publishers inc, 1996.
[5] L. Lisker, and A. S. Abramson, “A cross-language study of
voicing in initial stops: Acoustical measurements,” Word, 20, pp.
384–422, 1964.
[6] L. Lisker, and A. S. Abramson, “Some effects of context on voice
onset time in English stops,” Language and Speech, 10, pp. 1–28,
1967
[7] T. Cho, and L. Ladefoged, “Variation and universals in VOT:
Evidence from 18 languages,” Journal of Phonetics, 27, pp. 207–
229. 1999.
[8] A. S. Abramson, and D. Whalen, “Voice Onset Time (VOT) at
50: Theoretical and practical issues in measuring voicing
distinctions,” Journal of Phonetics, 63, pp. 75–86, 2017.
[9] M. Takada, Nihongo no gotou heisa’on no kenkyuu: VOT no
kyoujiteki bunpu to tsuujiteki henka [Research on the word-initial
stops of Japanese: Synchronic distribution and diachronic change
in VOT], Tokyo: Kurosio, 2011.
[10] G. J. Docherty, The timing of voicing in British English
obstruents, Berlin: Foris Publications, 1992.
[11] L. Davidson, “Variability in the implementation of voicing in
American English obstruents,” Journal of Phonetics, 54, pp. 35–
50, 2016
[12] P. Boersma, and D. Weenink, Praat: doing phonetics by computer.
Version 6.1.08, retrieved 21 December 2019 from
http://www.praat.org/
[13] R Core Team, “R: A language and environment for statistical
computing,” R Foundation for Statistical Computing, Austria:
Vienna, 2019. URL: https://www.R-project.org/.
[14] D. Bates, M. Maechler, B. Bolker, and S. Walker, “Fitting Linear
Mixed-Effects Models Using lme4,” Journal of Statistical
Software, 67(1), pp. 1–48, 2015.
[15] A. Kuznetsova, P.B. Brockhoff, R.H.B. Christensen, “lmerTest
package: tests in linear mixed effects models,” Journal of
Statistical Software, 82(13), 2017.
[16] D. H. Klatt, “Voice Onset Time, frication and aspiration in word-
initial consonant clusters,” Journal of Speech and Hearing
Research, 18, pp. 686–706, 1975.
[17] P. Auzou, C. Ozsancak, R. J. Morris, M. Jan, F. Eustache, and D.
Hannequin, “Voice onset time in aphasia, apraxia of speech and
dysarthria: a review,” Clinical Linguistics & Phonetics, vol.14,
no. 2, pp. 131–150, 2000.
[18] A. S. House, “On Vowel Duration in English,” Journal of the
Acoustical Society of America, 33, pp.1174–1178, 1961.
[19] L. J. Raphael, “Preceding vowel duration as a cue to the
perception of the voicing characteristic of word‐final consonants
in American English,” Journal of the Acoustical Society of
America, 51, pp. 1296–1303, 1972.

You might also like