This study represents the first acoustic analysis of the seven-tone system of Lahu Nyi, a dialect... more This study represents the first acoustic analysis of the seven-tone system of Lahu Nyi, a dialect of Lahu, a Tibeto-Burman language spoken in Muang Na subdistrict, Chiang Dao district, Chiang Mai province, Thailand. One male and two female native speakers produced the seven tones in isolation and in a sentence context. Pitch contour analysis showed five long tones in open syllables and two short tones in syllables closed with a glottal stop. Low tones are slightly breathy. Pitch contour modification was observed in the sentence context produced by the male speaker where a high-mid falling tone exhibits a rising contour in the context of a following high-mid falling tone.
The Journal of the Acoustical Society of America, 2019
Directional asymmetries have been documented in both infant and adult perception of lexical tones... more Directional asymmetries have been documented in both infant and adult perception of lexical tones. For example, Tsao (2008) found that a stimulus change from the background mandarin T1 (55) to the target Mandarin T3 (213) was easier than the reverse among one-year-old Mandarin learning infants. Yeung et al. (2013) reported that 4- and 9-month-old Mandaring learning infants are better at Cantonese tone discrimination after being familiarized with T2 (25) than with T3 (33). Francis and Ciocca (2003) found that native Cantonese speakers’ tone discrimination was better when the first syllable was higher in frequency (about 4 Hz) than the second syllable. Finally, in an ERP study, Politzer-Ahles et al. (2016) found that Mismatch Negativity (MMN) was attenuated among both native and non-native Mandarin listeners when Mandarin T3 was the standard and another deviant in comparison to the reverse. In this talk, we will report results from two studies examining the effects of memory load and ...
The Journal of the Acoustical Society of America, 2019
It is well-established that the production of non-native lexical tone poses a great challenge to ... more It is well-established that the production of non-native lexical tone poses a great challenge to adult L2 learners (Hao,2012; Chang and Yao, 2016; Mok et al., 2018). The production of tonal patterns on disyllabic words is more challenging for non-native speakers because additional computational and/or lexical mechanisms are involved in correctly applying the tone sandhi rules in languages like Mandarin (Chen et al., 2017). Previous speech training studies have shown that perceptual training could improve both perception and production of non-native tones in isolation (Wang et al., 1999, 2003; Wayland and Li, 2008). The current study aims to further examine if perceptual training promotes learning of tonal patterns on disyllabic words by examining the production of two Mandarin Tone sandhi rules—the third tone sandhi and half-third tone sandhi by Cantonese learners. Native Cantonese speakers were trained with an identification task and a same/different discrimination task with both r...
This study compared a new approach of lenition measure to traditional acoustic-based methods. In ... more This study compared a new approach of lenition measure to traditional acoustic-based methods. In this new approach, degrees of lenition are estimated from posterior probabilities generated by recurrent neural networks trained to recognize the sonorant and continuant phonological features. These two phonological features capture the range of surface manifestations, from a fricative to an approximant, of lenited voiced and voiceless stops in Spanish. Input to the networks is Mel-filtered log-energy computed from 25-ms windowed frames of each 0.5sec chunk of the input signals. When applied to lenition of intervocalic voiced and voiceless stops, /p, t, k, b, d, g/, in the corpus of Argentinian Spanish built by Google, the new approach yielded lenition patterns largely similar to those obtained using a quantitative acoustic method. Specifically, both approaches revealed that voiced stops were more lenited than voiceless stops, that lenition was more likely in unstressed syllables relativ...
Mandarin tones are perceived categorically by native listeners, but not by non-native listeners (... more Mandarin tones are perceived categorically by native listeners, but not by non-native listeners (e.g., Francis et al., 2003; Halle et al., 2004; Xu et al., 2006). Vowel quality, stimulus duration, and language background also significantly contributed to categorical perception of tones among native and non-native listeners (Chen et al., 2017). In comparison to pitch production, it was found that a relative shorter duration is required to perceive than to produce pitch contours, with non-tonal listeners needing longer duration to detect a change in the pitch direction. Duration asserts a stronger effect on between- and within-category discrimination patterns among tonal listeners. Fewer studies investigated the effects of stimulus duration and vowel quality in trilingual non-native speakers with and without musical training. Our study examines categorical perception of resynthesized pitch stimuli by 13 trilingual Cantonese musicians and 13 Cantonese non-musicians. We manipulated tones on both low and high vowels ([a] and [i]) to create 7-step, level-to-falling and level-to-rising pitch continua on both [a] and [i] vowels with 9 different duration values. Cantonese speakers participated in identification and same-different tasks.Mandarin tones are perceived categorically by native listeners, but not by non-native listeners (e.g., Francis et al., 2003; Halle et al., 2004; Xu et al., 2006). Vowel quality, stimulus duration, and language background also significantly contributed to categorical perception of tones among native and non-native listeners (Chen et al., 2017). In comparison to pitch production, it was found that a relative shorter duration is required to perceive than to produce pitch contours, with non-tonal listeners needing longer duration to detect a change in the pitch direction. Duration asserts a stronger effect on between- and within-category discrimination patterns among tonal listeners. Fewer studies investigated the effects of stimulus duration and vowel quality in trilingual non-native speakers with and without musical training. Our study examines categorical perception of resynthesized pitch stimuli by 13 trilingual Cantonese musicians and 13 Cantonese non-musicians. We manipulated tones on both low and high ...
Since dental-retroflex fricative contrast is not consistently maintained in many southern dialect... more Since dental-retroflex fricative contrast is not consistently maintained in many southern dialects of Chinese, native speakers of these dialects may not accurately produce the Mandarin retroflex fricative /ʂ/. Consequently, /ʂa/ may be realized as [sa] (Duanmu, 2007). This study investigated the variation of the retroflex fricative /ʂ/ in a Chinese Mandarin speech corpus (DataTang, 2018). The corpus contains 200 hours of recordings of 600 speakers from different dialectal regions in China. Each recording was aligned at the phone level using Montreal Forced Aligner. The center of gravity of the acoustic energy (COG) of the target sounds was extracted using Christian DiCanio’s Praat script. For statistical analysis, the generalized additive mixed-effects model (GAMM) was used. COG was the response variable. The following vowel’s height, tone, and gender were factorial predictors. To evaluate the geographic effect, we used tensor product smooths by fricatives with the longitude and lat...
This study investigated the vocal emotions in Japanese by analyzing acoustic features from emotio... more This study investigated the vocal emotions in Japanese by analyzing acoustic features from emotional utterances in the Online Gaming Voice Chat Corpus with Emotional Label (Arimoto and Kawatsu, 2013). The corpus contains the recorded sentences produced in 8 emotions by four native Japanese speakers who are professional actors. For acoustic feature extraction, Praat script ProsodyPro was used. Principle component analysis (PCA) was conducted to evaluate the contribution of each acoustic feature. In addition, a linear discriminant classifier (LDA) was trained with the extracted acoustic features to predict the emotion category and intensity. A generalized additive mixed model (GAMM) was performed to examine the effect of gender, emotional category, and emotional intensity on the time-normalized f0 values. The GAMM’s results suggested the effects of gender, emotion, and emotional intensity on the time-normalized f0 values of vocal emotions in Japanese. The recognition accuracy of the L...
Introduction Heritage speakers have been a population of interest for linguistic research for the... more Introduction Heritage speakers have been a population of interest for linguistic research for the unique insight they offer on various topics such as of native language acquisition, second language acquisition, distinctions between native and first languages, language dominance and proficiency, language transfer and bilingualism. In particular, those that codeswitch (see below for definition and references) as a form of communication are integral to understanding much of the underlying linguistic information present within the minds of bilinguals. Production data can be especially useful in myriad ways, including to examine syntactic realizations, grammaticality, phonological and phonetic phenomena, and morphological constructions utilized by bilinguals, for example. This project will make use of this invaluable population through recordings of spontaneous, semi-spontaneous and non-spontaneous speech to closely examine the extent to which phonetic convergence (discussed below) exists in the speech of bilinguals. This paper first reports on the previous literature regarding the topics of code-switching, heritage speakers, linguistic convergence, and bilingual phonetic systems. There have been no studies to date that examine a population of English-dominant Spanish heritage speakers in the scope of phonetic linguistic convergence during code-switching. We therefore aim to fill this gap by examining whether there is an occurrence of phonetic convergence in the code-switching speech of English-dominant Spanish heritage speakers. 2. Literature Review 2.1. Code-switching Code-switching has been defined by various past studies (e.g.
The contributions in this Festschrift were written by Ocke’s current and former PhD-students, col... more The contributions in this Festschrift were written by Ocke’s current and former PhD-students, colleagues and research collaborators. The Festschrift is divided into six sections, moving from the smallest building blocks of language, through gradually expanding objects of linguistic inquiry to the highest levels of description - all of which have formed a part of Ocke’s career, in connection with his teaching and/or his academic productions: “Segments”, “Perception of Accent”, “Between Sounds and Graphemes”, “Prosody”, “Morphology and Syntax” and “Second Language Acquisition”. Each one of these illustrates a sound approach to language matters.
This study represents the first acoustic analysis of the seven-tone system of Lahu Nyi, a dialect... more This study represents the first acoustic analysis of the seven-tone system of Lahu Nyi, a dialect of Lahu, a Tibeto-Burman language spoken in Muang Na subdistrict, Chiang Dao district, Chiang Mai province, Thailand. One male and two female native speakers produced the seven tones in isolation and in a sentence context. Pitch contour analysis showed five long tones in open syllables and two short tones in syllables closed with a glottal stop. Low tones are slightly breathy. Pitch contour modification was observed in the sentence context produced by the male speaker where a high-mid falling tone exhibits a rising contour in the context of a following high-mid falling tone.
The Journal of the Acoustical Society of America, 2019
Directional asymmetries have been documented in both infant and adult perception of lexical tones... more Directional asymmetries have been documented in both infant and adult perception of lexical tones. For example, Tsao (2008) found that a stimulus change from the background mandarin T1 (55) to the target Mandarin T3 (213) was easier than the reverse among one-year-old Mandarin learning infants. Yeung et al. (2013) reported that 4- and 9-month-old Mandaring learning infants are better at Cantonese tone discrimination after being familiarized with T2 (25) than with T3 (33). Francis and Ciocca (2003) found that native Cantonese speakers’ tone discrimination was better when the first syllable was higher in frequency (about 4 Hz) than the second syllable. Finally, in an ERP study, Politzer-Ahles et al. (2016) found that Mismatch Negativity (MMN) was attenuated among both native and non-native Mandarin listeners when Mandarin T3 was the standard and another deviant in comparison to the reverse. In this talk, we will report results from two studies examining the effects of memory load and ...
The Journal of the Acoustical Society of America, 2019
It is well-established that the production of non-native lexical tone poses a great challenge to ... more It is well-established that the production of non-native lexical tone poses a great challenge to adult L2 learners (Hao,2012; Chang and Yao, 2016; Mok et al., 2018). The production of tonal patterns on disyllabic words is more challenging for non-native speakers because additional computational and/or lexical mechanisms are involved in correctly applying the tone sandhi rules in languages like Mandarin (Chen et al., 2017). Previous speech training studies have shown that perceptual training could improve both perception and production of non-native tones in isolation (Wang et al., 1999, 2003; Wayland and Li, 2008). The current study aims to further examine if perceptual training promotes learning of tonal patterns on disyllabic words by examining the production of two Mandarin Tone sandhi rules—the third tone sandhi and half-third tone sandhi by Cantonese learners. Native Cantonese speakers were trained with an identification task and a same/different discrimination task with both r...
This study compared a new approach of lenition measure to traditional acoustic-based methods. In ... more This study compared a new approach of lenition measure to traditional acoustic-based methods. In this new approach, degrees of lenition are estimated from posterior probabilities generated by recurrent neural networks trained to recognize the sonorant and continuant phonological features. These two phonological features capture the range of surface manifestations, from a fricative to an approximant, of lenited voiced and voiceless stops in Spanish. Input to the networks is Mel-filtered log-energy computed from 25-ms windowed frames of each 0.5sec chunk of the input signals. When applied to lenition of intervocalic voiced and voiceless stops, /p, t, k, b, d, g/, in the corpus of Argentinian Spanish built by Google, the new approach yielded lenition patterns largely similar to those obtained using a quantitative acoustic method. Specifically, both approaches revealed that voiced stops were more lenited than voiceless stops, that lenition was more likely in unstressed syllables relativ...
Mandarin tones are perceived categorically by native listeners, but not by non-native listeners (... more Mandarin tones are perceived categorically by native listeners, but not by non-native listeners (e.g., Francis et al., 2003; Halle et al., 2004; Xu et al., 2006). Vowel quality, stimulus duration, and language background also significantly contributed to categorical perception of tones among native and non-native listeners (Chen et al., 2017). In comparison to pitch production, it was found that a relative shorter duration is required to perceive than to produce pitch contours, with non-tonal listeners needing longer duration to detect a change in the pitch direction. Duration asserts a stronger effect on between- and within-category discrimination patterns among tonal listeners. Fewer studies investigated the effects of stimulus duration and vowel quality in trilingual non-native speakers with and without musical training. Our study examines categorical perception of resynthesized pitch stimuli by 13 trilingual Cantonese musicians and 13 Cantonese non-musicians. We manipulated tones on both low and high vowels ([a] and [i]) to create 7-step, level-to-falling and level-to-rising pitch continua on both [a] and [i] vowels with 9 different duration values. Cantonese speakers participated in identification and same-different tasks.Mandarin tones are perceived categorically by native listeners, but not by non-native listeners (e.g., Francis et al., 2003; Halle et al., 2004; Xu et al., 2006). Vowel quality, stimulus duration, and language background also significantly contributed to categorical perception of tones among native and non-native listeners (Chen et al., 2017). In comparison to pitch production, it was found that a relative shorter duration is required to perceive than to produce pitch contours, with non-tonal listeners needing longer duration to detect a change in the pitch direction. Duration asserts a stronger effect on between- and within-category discrimination patterns among tonal listeners. Fewer studies investigated the effects of stimulus duration and vowel quality in trilingual non-native speakers with and without musical training. Our study examines categorical perception of resynthesized pitch stimuli by 13 trilingual Cantonese musicians and 13 Cantonese non-musicians. We manipulated tones on both low and high ...
Since dental-retroflex fricative contrast is not consistently maintained in many southern dialect... more Since dental-retroflex fricative contrast is not consistently maintained in many southern dialects of Chinese, native speakers of these dialects may not accurately produce the Mandarin retroflex fricative /ʂ/. Consequently, /ʂa/ may be realized as [sa] (Duanmu, 2007). This study investigated the variation of the retroflex fricative /ʂ/ in a Chinese Mandarin speech corpus (DataTang, 2018). The corpus contains 200 hours of recordings of 600 speakers from different dialectal regions in China. Each recording was aligned at the phone level using Montreal Forced Aligner. The center of gravity of the acoustic energy (COG) of the target sounds was extracted using Christian DiCanio’s Praat script. For statistical analysis, the generalized additive mixed-effects model (GAMM) was used. COG was the response variable. The following vowel’s height, tone, and gender were factorial predictors. To evaluate the geographic effect, we used tensor product smooths by fricatives with the longitude and lat...
This study investigated the vocal emotions in Japanese by analyzing acoustic features from emotio... more This study investigated the vocal emotions in Japanese by analyzing acoustic features from emotional utterances in the Online Gaming Voice Chat Corpus with Emotional Label (Arimoto and Kawatsu, 2013). The corpus contains the recorded sentences produced in 8 emotions by four native Japanese speakers who are professional actors. For acoustic feature extraction, Praat script ProsodyPro was used. Principle component analysis (PCA) was conducted to evaluate the contribution of each acoustic feature. In addition, a linear discriminant classifier (LDA) was trained with the extracted acoustic features to predict the emotion category and intensity. A generalized additive mixed model (GAMM) was performed to examine the effect of gender, emotional category, and emotional intensity on the time-normalized f0 values. The GAMM’s results suggested the effects of gender, emotion, and emotional intensity on the time-normalized f0 values of vocal emotions in Japanese. The recognition accuracy of the L...
Introduction Heritage speakers have been a population of interest for linguistic research for the... more Introduction Heritage speakers have been a population of interest for linguistic research for the unique insight they offer on various topics such as of native language acquisition, second language acquisition, distinctions between native and first languages, language dominance and proficiency, language transfer and bilingualism. In particular, those that codeswitch (see below for definition and references) as a form of communication are integral to understanding much of the underlying linguistic information present within the minds of bilinguals. Production data can be especially useful in myriad ways, including to examine syntactic realizations, grammaticality, phonological and phonetic phenomena, and morphological constructions utilized by bilinguals, for example. This project will make use of this invaluable population through recordings of spontaneous, semi-spontaneous and non-spontaneous speech to closely examine the extent to which phonetic convergence (discussed below) exists in the speech of bilinguals. This paper first reports on the previous literature regarding the topics of code-switching, heritage speakers, linguistic convergence, and bilingual phonetic systems. There have been no studies to date that examine a population of English-dominant Spanish heritage speakers in the scope of phonetic linguistic convergence during code-switching. We therefore aim to fill this gap by examining whether there is an occurrence of phonetic convergence in the code-switching speech of English-dominant Spanish heritage speakers. 2. Literature Review 2.1. Code-switching Code-switching has been defined by various past studies (e.g.
The contributions in this Festschrift were written by Ocke’s current and former PhD-students, col... more The contributions in this Festschrift were written by Ocke’s current and former PhD-students, colleagues and research collaborators. The Festschrift is divided into six sections, moving from the smallest building blocks of language, through gradually expanding objects of linguistic inquiry to the highest levels of description - all of which have formed a part of Ocke’s career, in connection with his teaching and/or his academic productions: “Segments”, “Perception of Accent”, “Between Sounds and Graphemes”, “Prosody”, “Morphology and Syntax” and “Second Language Acquisition”. Each one of these illustrates a sound approach to language matters.
A phonologically informed neural network approach, Phonet, was compared to acoustic measurements ... more A phonologically informed neural network approach, Phonet, was compared to acoustic measurements of intensity, duration and harmonicity in estimating lenition degree of voiced and voiceless stops in a corpus of Argentine Spanish. Recurrent neural networks were trained to recognize phonological features [sonorant] and [continuant]. Their posterior probabilities were computed over the target segments. Relative to most acoustic metrics, posterior probabilities of the two features are more consistent, and in the direction predicted by known factors of lenition: stress, voicing, place of articulation, surrounding vowel height, and speaking rate. The results suggest that Phonet could more reliably quantify lenition gradient than some acoustic metrics.
Proceedings of the 20th International Congress of Phonetic Sciences, Prague 2023, 2023
Alcohol is known to impair fine articulatory control and movements. In drunken speech, incomplete... more Alcohol is known to impair fine articulatory control and movements. In drunken speech, incomplete closure of the vocal tract can result in deaffrication of the English affricate sounds /tʃ/ and /ʤ/, spirantization (fricative-like production) of the stop consonants and palatalization (retraction of place of articulation) of the alveolar fricative /s/ (produced as /ʃ/). Such categorical segmental errors have been well-reported. This study employs a phonologically-informed neural network approach to estimate degrees of deaffrication of /tʃ/ and /ʤ/, spirantization of /t/ and /d/ and place retraction for /s/ in a corpus of intoxicated English speech. Recurrent neural networks were trained to recognize relevant phonological features [anterior], [continuant] and [strident] in a control speech corpus. Their posterior probabilities were computed over the segments produced under intoxication. The results obtained revealed both categorical and gradient errors and, thus, suggested that this new approach could reliably quantify fine-grained errors in intoxicated speech.
Spanish voiced stops /b, d, ɡ/ surfaced as fricatives [β, ð, ɣ] in intervocalic position due to a... more Spanish voiced stops /b, d, ɡ/ surfaced as fricatives [β, ð, ɣ] in intervocalic position due to a phonological process known as spirantization or, more broadly, lenition. However, conditioned by various factors such as stress, place of articulation, flanking vowel quality, and speaking rate, phonetic studies reveal a great deal of variation and gradience of these surface forms, ranging from fricative-like to approximant-like [βT, ðT, ɣT]. Several acoustic measurements have been used to quantify the degree of lenition, but none is standard. In this study, the posterior probabilities of sonorant and continuant phonological features in a corpus of Argentinian Spanish estimated by a deep learning Phonet model as measures of lenition were compared to traditional acoustic measurements of intensity, duration, and periodicity. When evaluated against known lenition factors: stress, place of articulation, surrounding vowel quality, word status, and speaking rate, the results show that sonorant and continuant posterior probabilities predict lenition patterns that are similar to those predicted by relative acoustic intensity measures and are in the direction expected by the effort-based view of lenition and previous findings. These results suggest that Phonet is a reliable alternative or additional approach to investigate the degree of lenition.
Uploads
Papers by Ratree Wayland