Academia.eduAcademia.edu

An EPG+UTI study of Italian /r/

2015

This paper describes a system for the acquisition, real-time synchronization and analysis of acoustic, electropalatographic (EPG) and ultrasonographic (UTI) data. Simultaneous data on linguo-palatal contact and tongue sagittal profiles are captured for rhotic consonants produced by a native speaker of Italian. Three anterior variants of /r/ ([ɾ], [ɹ] and [ɹ]) are shown to be realized with an apical tongue gesture, but different vowel-related coarticulation patterns. The paper discusses the implication of the proposed analysis for a coherent investigation of lingual and linguo-palatal dynamics.

AN EPG+UTI STUDY OF ITALIAN /r/ Lorenzo Spreaficoa, Chiara Celatab, Alessandro Viettia, Chiara Bertinib, Irene Riccib a ALPs – Alpine Laboratory of Phonetics and Phonology – Free University of Bozen-Bolzano, Italy b Laboratorio di Linguistica 'Giovanni Nencioni' – Scuola Normale Superiore, Pisa, Italy {lorenzo.spreafico, alessandro.vietti}@unibz.it; {chiara.celata, c.bertini, i.ricci}@sns.it This paper describes a system for the acquisition, real-time synchronization and analysis of acoustic, electropalatographic (EPG) and ultrasonographic (UTI) data. Simultaneous data on linguo-palatal contact and tongue sagittal profiles are captured for rhotic consonants produced by a native speaker of Italian. Three anterior variants of /r/ ([ɾ], [ɹ̝ ] and [ɹ]) are shown to be realized with an apical tongue gesture, but different vowel-related coarticulation patterns. The paper discusses the implication of the proposed analysis for a coherent investigation of lingual and linguo-palatal dynamics. movements of the whole tongue profile, including tongue root and postdorsum. The second goal of the paper is that of providing a multi-level articulatory definition for alveolar rhotics according to well-established linguo-palatal contact indexes and common procedures for lingual shape analysis. Innovative data reduction techniques allowing a coherent investigation of lingual and palatal dynamics are also discussed. Rhotic sounds are known for their exceptional degree of syntagmatic, cross-linguistic, and even crossindividual variability [5,13]; an integrated perspective on the dynamics of the vocal tract is expected to provide cues for a deeper understanding of their phonetic properties and systemic behaviour. Keywords: rhotics, electropalatography, ultrasound tongue imaging, multichannel database, Italian. 2. MULTICHANNEL ACOUSTICARTICULATORY SYSTEM ABSTRACT 1. INTRODUCTION Speech articulatory complexity poses challenging questions to both phonetic-phonological theories and speech technology research. The wide range of applications of articulatory studies explains the growing interest in the collection of multi-level corpora of speech data integrating data from diverse instrumental techniques [7,8,14,15,16]. Within the framework of multi-level articulatory approaches to speech variation, this paper describes an original system for the acquisition, real-time synchronization and analysis of acoustic, electropalatographic (EPG) and ultrasonographic (UTI) data. The aim of the paper is twofold. The first aim is to introduce the implementation procedure and the obtained system for high-speed synchronous audioEPG-UTI recording. The system is used to capture simultaneous data on linguo-palatal contact and tongue sagittal profiles of rhotic consonants produced by native speakers of a Tuscan variety of Italian. EPG and UTI instrumental techniques are to be seen as complementing each other inasmuch as the former observes linguo-palatal dynamics from the perspective of the realized contacts - with particular precision in the anterior vocal tract - while UTI focuses on linguo-palatal distances and 2.1. Experiment design The speech corpus has been collected through a multi-repetition sentence reading task performed by 2 adult female speakers of a Western Tuscan variety of Italian (Pisa and Livorno districts). 62 sentences of equal length (8 phonetic syllables) and consistent intonation structure were selected from the list in [3]. The /r/ appeared either in intervocalic position, or in a biphonemic consonant cluster. When embedded in clusters, the rhotics could be preceded or followed by obstruents of various places of articulation (bilabial, alveolar and velar) and laryngeal status (voiced and unvoiced). The vocalic context could also vary, with /i/, /a/ and /u/ symmetrically distributed before and after the consonantal interval. A total of 558 stimuli was collected for the two speakers. The present analysis is based on a subset of 237 stimuli produced by one of the speakers. 2.2. Data acquisition and annotation procedure The data acquisition system was able to synchronously record EPG and UTI data at sufficiently high frame rates to correctly characterize the movements of the tongue, as well as the acoustic speech signals. The acquisition platform was developed using Articulate Instruments hardware and the Articulate Assistant Advanced (AAA) software tool [17] in the ALPs lab in Bozen. Electropalatographic data was captured via the WinEPG (SPI 1.0) unit recording EPG at 100Hz. Ultrasound data was captured via the Ultrasonix device used in conjunction with a head-mounted micro-convex probe, with depth set at 80mm and angle at 127°, capturing a mid-sagittal tongue image at a rate of 91Hz. In order to synchronize EPG with UTI, the microphone signal was passed through the WinEPG unit. That way, the WinEPG added a synchronization tone to the microphone signal and that was used by AAA to synchronize the EPG with synchronization pulses from the SonixTablet. The audio files were exported into Praat (5.3.62) for an acoustic and auditory analysis of /r/ realizations. Each /r/ token was first categorized according to manner of articulation (i.e. trill, tap, approximant, fricative). The constriction interval was annotated as voiced or unvoiced according to spectrographic evidence. The following analysis focuses on the three alveolar rhotic variants, namely, the tap [ɾ], the fricative [ɹ̝ ] and the approximant [ɹ] variant. While the tap was the most frequent realization in the dataset, approximants and fricatives were also present across phonetic contexts, and no specific distributional bias could be detected for any of the variants. 3. EPG+UTI ARTICULATORY ANALYSIS The analysis focused first on the articulatory correlates of the three rhotic variants, with the aim of verifying to what extent the acoustic-auditory classification could be corroborated by coherent sets of articulatory characteristics. Within-category variation was also analysed by investigating the influence of the phonetic context on the realization of variants. In particular, the present contribution focuses on vowel-induced coarticulation. The articulatory measurements that are described and discussed in the following sections were taken at the midpoint of all constriction intervals. 3.1. EPG analysis A traditional analysis based on the contact index method ([10,11]) was performed on the anterior (rows 1-4) and posterior (rows 5-8) parts of the palate. The Qp index (percentage of contact in the posterior palate) was extracted in order to determine the amount of activated electrodes in the velarmediopalatal region. In addition, two indexes were extracted for the anterior palate. CAa (contact anteriority in the anterior palate) served as a measure of the anteriority of contact along the sagittal dimension. CCa (contact centrality in the anterior palate) was calculated to determine the coronal extension of linguo-palatal contact; in fact, the activated electrodes in the four central columns of the palate receive a higher score than the activated electrodes on the four (two on the right and two on the left) peripheral columns. We hypothesized that the linear combination of the three dependent variables Qp, CAa and CCa may account for the categorization of the tokens into taps, fricatives and approximants as well as for differences in their coarticulatory behaviour as a function of variations in the vocalic context. The interaction between the two factors ‘variant’ and ‘vowel place’ in multivariate tests confirmed the hypothesis (e.g. Pillai’s trace F= 2.311, df=12, p < .001), thus indicating that the vocalic context influenced the realization of the three variants to a different extent. Univariate tests were run to ascertain the contribution of the individual dependent factors in the interaction and they showed that the two indexes concerning the anterior contact patterns did contribute significantly (CAa: F=3.912, df=4, p<.01; CCa: F=3.629, df=4, p<.01) while Qp gave non-significant results (Fig. 1). Betweensubject tests and Bonferroni multiple comparisons showed in particular that, in the posterior palate, there was a significant effect of vowel place (F=61.710, df=2, p=.000), with highest contact values for the /i/ > /u/ > /a/ contexts, but a nonsignificant effect of variant, thus indicating a generalized increase of contact in the mediopalatal/velar region caused by coarticulation with the high front vowel. In the anterior palate, variant produced significant effects (CAa: F=23.375, df=2, p=.000; CCa: F=40.883, df=2, p=.000) while vowel place did not. Post-hoc Bonferroni tests revealed in particular that the three variants were consistently different for contact extension (tap > fricative > approximant) and less so for contact anteriority (tap > fricative, approximant) since in the /u/ context the approximant and the fricative were not significantly different for contact anteriority. Only two constriction locations for the rhotics were thus detectable when the adjacent vowels required a raised and retracted tongue dorsum. The tap consistently showed the anterior and most extended contact pattern across vocalic context and turned out to be the least resistant to V-induced coarticulation. Figure 1: Average Qp, CAa and CCa values at constriction midpoint for the three rhotic variants as a function of vocalic context. Grey line: fricatives; black continuous: approximants; black dashed: taps. High resolution images are available on the web by clicking on the figures. The EPG analysis thus revealed that the three rhotic variants were all characterized by an anterior articulation with no active post-dorsum involvement, except for an increase in posterior contact due to the coarticulation with high back vowels. The linguopalatal contact investigation also revealed that the most robust difference among variants referred to contact degree in the anterior palate, with taps consistently realizing more contact and an anterior configuration than both approximants and fricatives. 3.2. UTI analysis Figure 3: Smoothed splines for the /r/ variants. Though being very close to one another, the curves appeared to diverge for the utmost posterior and anterior parts of the tongue. Bayesian confidence intervals of the interaction effects were calculated to determine whether and where the curves were significantly different (Fig. 4). Figure 4: Interaction effects with Bayesian confidence intervals for [ɹ], [ɹ̝ ] and [ɾ]. Using the algorithm implemented in the AAA software [2], cubic splines were fitted to sagittal tongue curves at each mid-constriction for /r/. Fitted splines were exported to a workspace to calculate an average tongue contour for each variant, based on means at each of 42 fan lines (Fig. 2). Figure 2: Means for the tongue shape during the constriction phase for /r/. Green line: palate; blue lines tap and fricative; red line: approximant. Teeth on the right, pharynx on the left. Figure 2 shows palate and mean splines for the alveolar tap, the fricative and the approximant. Overall, the mean profiles describe a mid-bunched contour, where the tongue tip forms the primary constriction; the front of the tongue is lowered while the dorsum is raised towards the hard palate. The visual comparison of the contours for each variant showed that the tongue post-dorsum and root were lower for [ɾ] than for [ɹ̝ ] and [ɹ]. SS ANOVA quantitative analysis [4] showed differences of greater magnitude (Fig. 3). As regards approximants, the smoothed spline was not significantly different from that of the spline that best fits all data (= Bayesian intervals encompassed zero) for about two-thirds of the total length of the tongue. The most significant difference corresponded to the posterior part of the tongue, thus suggesting slight root retraction in the articulation of approximants. As for fricatives, the significant difference corresponded again to the post-dorsum, displaying tongue lowering. As for taps, the difference mostly involved the anterior part of the tongue, presenting a tip-raising gesture. To further investigate such apparent dissimilarity of the underlying splines, we extracted smoothed splines for the tongue shape during the closure phase of /r/ in the /a/, /i/ and /u/ environment respectively (Fig. 5, 6). Data clearly confirm V-induced coarticulation found in the EPG analysis. Figure 5: Smoothed splines for /r/ when surrounded by [a] (red), [i] (green) and [u] (blue). Figure 6: Interaction effects with Bayesian confidence intervals for [a], [i] and [u]. An original scalar measure was finally elaborated and the areas enclosed by the tongue spline, the roof spline, and the front line 3 and the front line 5 (FrontalArea1); or the front line 5 and the front line 7 (FrontalArea2); or the front line 7 and the dorsal line 3 (DorsoFrontalArea) in the AAA workspace were calculated (Fig. 7). Figure 7: Red box: FrontalArea1; blue box: FrontalArea2; yellow box: DorsoFrontalArea. Figure 8 shows that the mean measure for the FrontalArea1 increases from the tap, to the fricative, to the approximant, thus mirroring EPG data on constriction degree shown in §3.1. Figure 8: Plot of mean FrontalArea1 for [ɹ], [ɹ̝ ] and [ɾ]. As a last step, in order to establish correlations between the EPG indexes and the UTI area measures, a stepwise linear regression was computed. Results (Fig. 9) showed a significant but rather mild interaction between Qp and DorsoFrontalArea, as well as an even slighter interaction between FrontalArea2 and both CAa an CCa. Figure 9: Correlations among EPG indexes and UTI scalar measures of areas. 4. DISCUSSION This study has shown the implementation and analytic reliability of a multi-level acousticarticulatory system allowing real-time alignment of acoustic, UTI and EPG data on lingual movements during speech articulation. The system proved useful to characterize the apical rhotics of a selected Tuscan Italian variety by showing that the observed articulatory variability is to be referred to both linguo-palatal contact patterns and tongue body and root synergistic behaviour with respect to vowel-dependent articulatory movements. In particular, the study has shown that [ɾ], [ɹ̝ ] and [ɹ] are realized by means of substantially similar lingual gestures, i.e. involving apico-predorsal coupling, predorsum raising but no tongue-body retraction, as typically reported for apico-alveolar non-trilling rhotics ([9,11]). However, the three variants differ for constriction degree and coarticulatory resistance to adjacent vowel, with taps showing the most constrained apical configuration and the most /i/-like tongue dorsum configuration. Position in the syllable and consonantal context may account for other production events that are left to future investigation. Additional implications of the current study refer to the theoretical and methodological challenges of combining electropalatographic and ultrasonographic information in the description of the articulatory events. The adoption of innovative scalar measures for UTI investigation such as the area measures will allow establishing direct correlations between the palatal contact indices and the distance measures calculated for selected portions of the lingual profile. The prosecution of this study will thus highlight the existing relationships between the palatographic / sonographic representations of lingual movements and the resulting acoustic events (formant structures) during the production of contextually different rhotic realizations. 5. ACKNOWLEDGEMENTS Thanks to Alan Wrench for his helpfulness. Financial support from: Provincia Autonoma di Bolzano – Alto Adige, Ripartizione allo studio, Università e ricerca scientifica 2013-16 “The articulatory sociophonetics of bilinguals in SouthTyrol: The Ultrasound Tongue Imaging potential”; Scuola Normale Superiore, Laboratorio di Linguistica and project 367-GR13Celata “Modeling speech variation in the socio-communicative context” 2013-2015. 6. REFERENCES [1] Articulate Instruments Ltd 2010. WinEPG Installation and User`s Manual: Revision 1.18. Edinburgh, UK: Articulate Instruments Ltd. [2] Articulate Instruments Ltd 2014. Articulate Assistant Advanced User Guide: Version 2.15. Edinburgh, UK: Articulate Instruments Ltd. [3] Celata, C., Bertini, C., Ricci, I. 2014. Proprietà acustiche e articolatorie di /r/ nella Toscana occidentale. X Convegno Nazionale AISV Torino, 2224 January, 2014. [4] Davidson, L. 2006. Comparing tongue shapes from ultrasound imaging using smoothing spline analysis of variance. J. Acoust. Soc. Am. 120/1, 407-415. [5] Docherty, G., Foulkes, P. 2001. Variability in /r/ production: Instrumental perspectives. In: Van de Velde, H., van Hout, R. (eds) r-atics: sociolinguistic, phonetic and phonological characteristics of /r/. Bruxelles: ILVP/ULB, 173–184. [6] Lawson, E., Scobbie, J., Stuart-Smith, J. 2011. The social stratification of tongue shape for postvocalic /r/ in Scottish English. J. Sociolinguistics 15/2, 256-268. [7] Meister, E., Meister, L. 2012. Multimodal Corpus of Speech Production: Work in Progress. In: Tavast, A. et al. (eds.), Human Language Technologies. The Baltic Perspective. Amsterdam: IOS Press, 146–153. [8] Narayanan, S. et al. 2014. Real-time magnetic resonance imaging and electromagnetic articulography database for speech production research, J. Acoust. Soc. Am. 136/3, 1307–1311. [9] Nicolaidis, K., Baltazani, M. 2011. An electropalatographic and acoustic study of the Greek rhotic in /Cr/ clusters. Proc. 17th ICPhS Hong-Kong, 1474–1478. [10] Recasens, D., Pallarès, M.D. 1999. A study of /ɾ/ and /r/ in the light of the “DAC” coarticulation model. Journal of Phonetics 19, 267–280. [11] Recasens, D., Espinosa, A. 2007. Phonetic typology and positional allophones for alveolar rhotics in Catalan. Phonetica 64, 1–28. [12] Schabus, D. 2014. The MMASCS multi-modal annotated synchronous corpus of audio, video, facial motion and tongue motion data of normal, fast and slow speech. Proc. 9th LREC Reykjavik 3411–3416. [13] Spreafico, L., Vietti, A. 2013. On rhotics in a bilingual community: A preliminary UTI research. In: Spreafico, L., Vietti, A. (eds), Rhotics. New data and perspectives. Bolzano: BU Press, 57–79. [14] Steiner, I., Richmond, K., Marshall, I., Gray, C. 2012. The magnetic resonance imaging subset of the mngu0 articulatory corpus. J. Acoust. Soc. Am. 131/2, 106-111. [15] Steiner, I., Knopp, P., Musche, P., Schmiedel, A., Braun, A., Ouni, S. 2014. Investigating the effects of posture and noise on speech production. Proc. 10th ISSP Cologne, 413-415. [16] Wrench, A.A. 1999. The MOCHA-TIMIT articulatory database. http://www.cstr.ed.ac.uk/research/projects/artic/mocha .htm [17] Wrench, A.A., Scobbie, J.M. 2008. High-speed cineloop ultrasound vs. video ultrasound tongue imaging: comparison of front and back lingual gesture location and relative timing. Proc. 8th ISSP Strasbourg, 57-60.