Neutralization of voicing distinction of stops in Tohoku dialects of
Japanese: Field work and acoustic measurements
Ai Mizoguchi1,2, Ayako Hashimoto3, Sanae Matsui4, Setsuko Imatomi5, Ryunosuke Kobayashi4, and Mafuyu Kitahara4 1 Maebashi Institute of Technology, Gunma, Japan 2 National Institute for Japanese Language and Linguistics, Tokyo, Japan 3 Tokyo Kasei-gakuin College, Tokyo, Japan 4 Sophia University, Tokyo, Japan 5 Mejiro University, Tokyo, Japan [email protected], [email protected], [email protected], [email protected], [email protected], [email protected]
merged in the southern Tohoku region. That is, /ito/ ‘string’ is
Abstract pronounced as [ido] and /ido/ ‘well’ as [ido]. On the contrary, Research on Tohoku dialects, which is a variety of Japanese, such a merging is avoided in northern Tohoku dialects because has found that the voiceless stops /k/ and /t/ in the intervocalic the voiced obstruents /b/, /d/, and /z/ are pre-nasalized position are frequently realized as voiced stops. However, the intervocalically as [mb], [nd], and [ndz], respectively. As for /ɡ/, phenomenon has mainly been judged aurally in the Japanese a velar stop, it is fully nasalized as [ŋ]. Since /ito/ is pronounced linguistics literature and has not been confirmed by acoustic as [ido], while /ido/ as [indo] in northern Tohoku dialects, the measurements. We measured the VOT of data originally distinction is maintained. collected in the survey of Tohoku dialects by [1]. The data used [2] surveyed the sounds of Tohoku dialects and suggested in this study includes two age groups from eight sites. The the effects of adjacent vowels on intervocalic voicing. In terms results demonstrate that for word medial stops, the VOT of acoustic measurements, VOT, which is a widely used distribution of voiced and voiceless stops largely overlapped, acoustic metric of voicing [5–8], was investigated for the word- while, the laryngeal contrast was maintained for the word initial initial stops in Tohoku dialects showing a bi-modal VOT stops. Intervocalic voicing neutralization was confirmed by distribution for voiced and voiceless stops [9]. Regarding the quantitative acoustic measurements. The effects of neighboring utterance position, the word-initial stops in an isolated utterance vowels were also investigated to show that height, but not tend to have a longer VOT than the ones in a sentence in various duration, had a significant effect on voicing neutralization. Our languages [5, 6, 10]. Word-medial stops tend to have a greater results shed light on the phonetic nature of Tohoku dialects as voiced portion during the closure in American English than well as on their phonological structure, such as the role of word-initial stops do [11]. However, few studies have voicing contrast. investigated the VOT of word-medial stops in a language that Index Terms: intervocalic voicing, neutralization, Tohoku features voicing neutralization. dialects, VOT In this study, we measured the VOT of the data collected in the recent survey by [1], which was conducted to investigate present-day Tohoku dialects, and confirmed intervocalic voicing neutralization using quantitative acoustic 1. Introduction measurements. Tohoku dialects are spoken in the Tohoku district, the northern part of Honshu (the mainland) in Japan. They show some salient 2. Method characteristics in pronunciation that distinguish them from The speech data for the present study were originally collected other dialects in Japan [2–4]. Tohoku dialects can be further for a project on the phonological descriptions of Tohoku divided into two groups depending on their properties: northern dialects [1]. A brief overview of the recording in the project and Tohoku dialects and southern Tohoku dialects. the description of the portion of data in the present study along The most prominent characteristic of the consonants found with the acoustic and statistical analyses are given in this in Tohoku dialects is that the voiceless stops /k/ and /t/ are section. voiced intervocalically in both northern and southern Tohoku dialects. For example, /atama/ ‘head’ is pronounced as [adama] 2.1. Recording sites and /kaki/ ‘persimmon’ as [kaɡi]. On the contrary, word-initial voiceless stops are not voiced. This fact suggests that voiceless The left panel in Figure 1 shows the map of the Tohoku district stops are voiced intervocalically because they assimilate into in Japan. The recording sites in the survey by [1] are shown in the voicing feature of neighboring vowels, which are the right panel. To capture the comprehensive characteristics of fundamentally voiced, and that they are not voiced word- Tohoku dialects, distinct sites with sufficient distance from the initially because there is no vowel before them. As a result of geographical, historical, and cultural points of view were the intervocalic voicing of voiceless stops, certain words are chosen. Table 2: Word list. Initial Medial Both tokei gitaa tomato Aomori takigi kutsushita ・ daikon hata Hirosaki ・ ・Hachinohe Coronal natto mado budoo ・Morioka Akita ・ kutsu tsuki, suki kamakiri ・Kamaishi kutsushita fuki, yuki kaki ・Ichinoseki kisha takigi, azuki kiku kusa oke, tokei Mikawamachi ・ kuchibashi yakan, mikan Tsuruoka ・ ・Sendai kujira okashi, suika gitaa shika Velar chikarakobu daikon, neko ・Aizu-wakamatsu baiku, omikuji tsukushi hoshigaki Figure 1 Left: Tohoku district in Japan. Right: Eleven nagagutsu recording sites in the survey [1]. 2.4. Acoustic measurements 2.2. Participants The words were transcribed and segmented using Praat [12] by Tohoku dialects were recorded from 2012 to 2016 in 11 sites trained phoneticians and checked by another phonetician in our covering all the six prefectures in the Tohoku district. The total group. The target stops were visually identified by the clear number of recorded speakers was 61, whose ages ranged from existence of a closure and the following burst. The start of the 10 to 92. In the present study, the data on 24 speakers from eight voicing was identified by the visible concentration of energy sites in two age groups (69–85 years and 33–53 years) were below the 1kHz range. Only the segments with a clear burst and analyzed. Table 1 summarizes the eight sites, age groups, and voicing were used to measure the VOT. In other words, those number of speakers. without a visible burst or with a devoiced vowel were omitted from further analyses. Vowels were identified by a visible Table 1: Number of participants in each site. structure of F2 and F3, which did not necessarily coincide with the start or end of the voicing. The duration and pitch of the Site 69–85 33-53 surrounding vowels were also measured. Figure 2 shows an years years example utterance in which the burst and voice onset for the Aomori 0 2 word-initial stop were measured and those for the word-medial Hachinohe 2 0 stop were not measured due to continuing voicing throughout Akita 2 2 the closure. Morioka 1 2 Kamaishi 2 0 Ichinoseki 2 1 Tsuruoka 2 2 Aizu-wakamatsu 2 2
2.3. Materials and procedures
The picture task, which involved 48 photos or illustrations presented to participants sequentially, was conducted by one of the authors. Participants were asked to pronounce the name of the depicted object in their usual spoken language. The Figure 2: Examples of the measured burst (b) and recording was done at town halls or the participant’s home voice onset (v) for [t] in /tokei/. The burst and voice using a SONY ECM-MS957 microphone on a SONY PCM- onset were not measured for [k] due to continuing D50 recorder (16bit, 44kHz). voicing throughout the closure (<cl>). Words containing coronal or velar stops (/t/, /d/, /k/, /ɡ/) were selected for acoustic analysis. Table 2 presents the set of 2.5. Statistical analyses expected words shown to participants by photos or illustrations. A linear mixed-effects model analysis was conducted in R [13], Even when participants produced a different word from the one using the lme4 [14] and lmerTest [15] packages. The models expected, if it contained a coronal or velar stop, the words were were selected using a step-down model building process and the included in the analysis. As a result, the total number of log-likelihood comparisons. analyzed words was 84. 3. Results significant effect (β=-30.46, t=-3.35, p<.01). A pairwise post hoc test comparing the estimated means showed that the VOT 3.1. VOT by segment position of the word-initial /t/ was significantly longer than that of the word-medial /t/ (p<.001) and that the VOT of the word-medial We measured and analyzed the voice onset time of the word- /t/ was not significantly different than that of the word-medial initial and -medial stops /t, d, k, ɡ/ in 84 varieties of stimulus /d/ (p=.98). In addition, the VOT of the word-initial /k/ was words uttered by 24 native speakers of Tohoku dialects. This significantly longer than that of the word-medial /k/ (p<.001). resulted in 830 target segments. For the word-medial position, Figures 4 and 5 show the case number normalized the cases in which the voicing from the previous vowel was histograms of the word-initial (Figure 4) and -medial (Figure 5) continued throughout the closure were excluded from the VOT of /t/, /d/, /k/, and /ɡ/. For the word-initial stops, the VOT analysis due to the impossibility of identifying the voice onset distribution looks bimodal with a slight overlap in the middle for the target segment. Due to this measurement difficulty and (Figure 4). By contrast, for the word-medial stops, there is no the selection of the stimulus words, the numbers included in the large variability of VOT across the voiced and voiceless stops analysis became imbalanced between the segments and (Figure 5). segment positions (Table 3). Thus, the results, especially those from the word-medial /ɡ/, should be considered as a reference.
Table 3: The number of target segments included in
the analysis, VOT and its SD, and the number of fully pre-voiced segments in the word-medial position excluded from the analysis. Segment N VOT mean SD Fully pre- (ms) voiced Word-initial /t/ 36 47.56 15.98 - /d/ 23 14.22 9.00 - /k/ 179 68.05 25.99 - /ɡ/ 21 22.95 23.97 - Word-medial /t/ 117 19.27 9.34 3 /d/ 24 15.67 8.24 37 /k/ 426 37.47 19.10 59 /ɡ/ 4 26.00 4.08 8 Figure 4: Normalized word-initial /t, d, k, ɡ/ VOT. Figure 3 shows the VOT of the word-initial and -medial /t, d, k, ɡ/ uttered by the 24 native speakers of Tohoku dialects.
Figure 3: Word-initial and -medial /t, d, k, ɡ/ VOT. 3.2. Effects on the VOT of the word-medial /t/ and /k/ The selected mixed-effects linear regression model The effects on the VOT of the word-medial /t/ and /k/ were predicting the VOT from the fixed effects of the interactions of further explored and a mixed-effect model was chosen, which voicing contrast (voiced/voiceless), place of articulation consisted of the fixed effects of the place of articulation, (alveolar/velar), and segment position (initial/medial), and the stimulus word duration, previous segment, and following segment duration and the following segment, with random segment, and a random intercept of speakers. Table 4 shows the intercepts of speakers and stimulus words, revealed that the interaction between voicing and place of articulation had a significant effects of the previous and following segments on As for the following vowel, if it was [aː] (long /a/), the VOT the VOT of the word-medial /t/ and /k/. was shorter than the baseline and if it was [i], the VOT was longer than the baseline. Table 4: Significant fixed effects in the mixed-effects VOT is affected by the phonological context [2, 16]. model on the VOT of the word-medial /t/ and /k/. According to [16], VOTs are longer before high vowels than before mid- or low vowels. Indeed, if the vocal fold tension is Predictor Estimate β (ms) t-value p-value high as occurs with high vowels, voicing is difficult and thus, previous [e] 16.00 3.48 <.001 the VOT becomes longer [17]. This explanation is mostly previous [i] 8.68 2.99 <.01 compatible with our finding that when the following vowel was previous [o] 9.17 1.99 <.05 a high vowel [i], the VOT was longer, whereas if it was a low vowel [aː], the VOT was short. However, not all the high and previous [u̥] -7.17 -2.28 <.05 low vowels used in the current experiment showed the same following [aː] -10.58 -2.10 <.05 significant effects. It also needs to be examined further why the following [i] 20.74 6.55 <.001 devoiced /u/ had a shorter VOT of the following stop. To narrow the adjacent vowel effects, vowel duration and f0 were added into the analyses. The results showed no effects of vowel To narrow the adjacent vowel effect, the vowel durations duration on the VOT irrespective of the vowels. In American were used as the fixed effects (vot ~ place + stimulus word English, the previous vowel duration is longer before voiced duration + previous/following segment * previous/following consonants than voiceless consonants [18, 19]. The difference segment duration + (1|speaker)). The models showed no between previous vowel duration in American English plays a significant effects of adjacent vowel durations. role in distinguishing the following voiced/voiceless contrast. The f0 values at the 3/4 time points for the previous vowel However, for the Tohoku dialects in the current study, although and at the 1/4 time points for the following vowel were also the word-medial voicing contrast was often neutralized, the used and the model (vot ~ place of articulation + stimulus word previous vowel duration was not relevant for the VOT. As duration + previous/following vowel * previous/following described in the Introduction, word-medial voiced and vowel f0 * speaker’s sex + previous/following vowel * voiceless stops are distinguished by the pre-nasalization of previous/following vowel f0 * speakers’ sex + (1|speaker)) voiced stops in the northern variations of Tohoku dialects, but showed that for male speakers, f0 had a significant effect on not in the southern variations. VOT when the previous vowel was [e] (β=-.90, t=-2.07, p<.05) In terms of f0, although a few significant effects were seen, and [i] (β=.70, t=2.83, p<.01). It also showed a significant effect the pitch accent, which is lexically assigned to each word in on VOT when the following vowel was [e] (β=-.19, t=-2.49, Japanese, was not controlled for the stimulus words and this had p<.05) and [oː] (β=-.77, t=-2.28, p<.05). a huge effect on vowels. Thus, it is not ideal to include this in the interpretation at this point. However, in discussing the vocal 4. Discussion fold tension, it will be necessary to see the f0 effect on the VOT. A closer look at vowel quality including vowel duration and 4.1. Intervocalic voicing neutralization f0 with more controlled stimuli may lead to a discussion on whether voicing neutralization occurs due to phonological Figure 3 and the linear mixed-effects analysis showed that the and/or physiological factors. VOTs of /t/ and /k/ were significantly shorter in the word- medial position than in the word-initial position. The VOT of the word-medial /t/ did not significantly differ than the word- 5. Conclusion medial /d/. The comparison between the word-medial /k/ and The intervocalic voicing neutralization in Tohoku dialects is a /ɡ/ was not reported here due to the small number of measurable widely known phenomenon. However, quantitative cases for the word-medial /ɡ/, but 59 cases of the word-medial measurements and statistical analyses have long been lacking. /k/ were produced with full pre-voicing (Table 3). In addition, In the future, it will be necessary to analyze more data from all Figure 5 shows the less variability across the voiced and sites across the Tohoku district to investigate its sociolinguistic voiceless stops and overlap of the VOT in the word-medial aspects. Analyzing more data will also provide better statistical position. These results showed that the voicing contrast tended power by increasing the measurable target segments, especially to disappear in the word-medial position in these dialects. This voiced ones. In addition, it will be necessary to closely examine neutralization has long been acknowledged as characteristic of fully pre-voiced cases with a negative VOT, which could not be Tohoku dialects in the Japanese linguistics literature [2–4], and incorporated into our analysis. Finally, because word-medial the current results confirmed this well-known phenomenon voicing is diminishing among the younger generation [2], it is using quantitative measurements. desirable to explore different age groups and the socio- demographic aspects of these dialects. 4.2. Effects of adjacent vowels Table 4 shows the effects of adjacent vowels on the VOT of the 6. Acknowledgements word-medial voiceless stops. If the previous vowel was a high vowel [i], or a mid-vowel [e] or [o], the VOT was predicted to We are truly grateful to the participants of the survey of Tohoku be longer than the baseline mean value. If the previous vowel dialects. This research was supported by JSPS Kakenhi was [u̥] (devoiced /u/), the VOT was predicted to be shorter than 24520438. the baseline. 7. References [1] A. Hashimoto, “Tohoku hougen ni okeru kagyou tagyou shi’in no yuuseika ni tsuite: Tohoku hougen onsei chousa kara Ⅱ [On the voicing of /k/ and /t/ in Tohoku dialects: A survey on sounds of Tohoku dialects II],” Tsuda Journal of Language and Culture, 34, pp. 88–102, 2019. [2] J. Ohashi, Tohoku hougen onsei no kenkyuu [Research on sounds of Tohoku dialects], Tokyo: Oufuu, 2002. [3] M. Shibatani, The Languages of Japan. Cambridge: Cambridge University Press, 1990. [4] N. Tsujimura, An Introduction to Japanese Linguistics. Oxford: Blackwell Publishers inc, 1996. [5] L. Lisker, and A. S. Abramson, “A cross-language study of voicing in initial stops: Acoustical measurements,” Word, 20, pp. 384–422, 1964. [6] L. Lisker, and A. S. Abramson, “Some effects of context on voice onset time in English stops,” Language and Speech, 10, pp. 1–28, 1967 [7] T. Cho, and L. Ladefoged, “Variation and universals in VOT: Evidence from 18 languages,” Journal of Phonetics, 27, pp. 207– 229. 1999. [8] A. S. Abramson, and D. Whalen, “Voice Onset Time (VOT) at 50: Theoretical and practical issues in measuring voicing distinctions,” Journal of Phonetics, 63, pp. 75–86, 2017. [9] M. Takada, Nihongo no gotou heisa’on no kenkyuu: VOT no kyoujiteki bunpu to tsuujiteki henka [Research on the word-initial stops of Japanese: Synchronic distribution and diachronic change in VOT], Tokyo: Kurosio, 2011. [10] G. J. Docherty, The timing of voicing in British English obstruents, Berlin: Foris Publications, 1992. [11] L. Davidson, “Variability in the implementation of voicing in American English obstruents,” Journal of Phonetics, 54, pp. 35– 50, 2016 [12] P. Boersma, and D. Weenink, Praat: doing phonetics by computer. Version 6.1.08, retrieved 21 December 2019 from http://www.praat.org/ [13] R Core Team, “R: A language and environment for statistical computing,” R Foundation for Statistical Computing, Austria: Vienna, 2019. URL: https://www.R-project.org/. [14] D. Bates, M. Maechler, B. Bolker, and S. Walker, “Fitting Linear Mixed-Effects Models Using lme4,” Journal of Statistical Software, 67(1), pp. 1–48, 2015. [15] A. Kuznetsova, P.B. Brockhoff, R.H.B. Christensen, “lmerTest package: tests in linear mixed effects models,” Journal of Statistical Software, 82(13), 2017. [16] D. H. Klatt, “Voice Onset Time, frication and aspiration in word- initial consonant clusters,” Journal of Speech and Hearing Research, 18, pp. 686–706, 1975. [17] P. Auzou, C. Ozsancak, R. J. Morris, M. Jan, F. Eustache, and D. Hannequin, “Voice onset time in aphasia, apraxia of speech and dysarthria: a review,” Clinical Linguistics & Phonetics, vol.14, no. 2, pp. 131–150, 2000. [18] A. S. House, “On Vowel Duration in English,” Journal of the Acoustical Society of America, 33, pp.1174–1178, 1961. [19] L. J. Raphael, “Preceding vowel duration as a cue to the perception of the voicing characteristic of word‐final consonants in American English,” Journal of the Acoustical Society of America, 51, pp. 1296–1303, 1972.