Skip to main content

Stefano Bannò

University of Cambridge, ALTA Institute, Post-Doc

Followers

82

Following

65

Public Views

I’m a Research Associate at the Cambridge University Institute for Automated Language Teaching and Assessment (ALTA). After obtaining my PhD at the University of Trento and Fondazione Bruno Kessler with a thesis on automatic assessment of L2 spoken English, I’m continuing my research in Cambridge.

Before starting the doctoral course, I obtained a master's degree in Philology and Literary Criticism and a bachelor's degree in Historical, Philological and Literary Studies at the University of Trento. During my master's studies, I spent a research period at the Lautarchiv of the Humboldt University in Berlin, where I analysed a corpus of phonographic recordings in multiple Italian dialects.

Besides my academic career, I have worked as a musician and a secondary school teacher. My research interests span from machine learning, natural language processing and computational linguistics to phonetics, sociolinguistics and second language acquisition, testing and assessment.

less

InterestsView All (11)

Uploads

Papers by Stefano Bannò

Automatic Assessment of Conversational Speaking Tests

Proc. 9th Workshop on Speech and Language Technology in Education (SLaTE), 2023

Many speaking tests are conversational, dialogic, in form with an interlocutor talking to one or ... more Many speaking tests are conversational, dialogic, in form with an interlocutor talking to one or more candidates. This paper investigates how to automatically assess such a test. State-of-theart approaches are used in a multi-stage pipeline: diarization and speaker assignment, to detect who is speaking and when; automatic speech recognition (ASR), to produce a transcript; and finally assessment. Each presents challenges which are investigated in the paper. Advanced foundation model-based auto-markers are examined: an ensemble of Longformer-based models that operates on the ASR output text; and a wav2vec2based system that works directly on the audio. The two are combined to yield the final score. This fully automated system is evaluated in terms of ASR performance, and related impact of candidate assignment, as well as prediction of the candidate mark on data from the Occupational English Test. This is a conversational speaking test for L2 English healthcare professionals.

Assessment of L2 Oral Proficiency Using Self-Supervised Speech Representation Learning

Proc. 9th Workshop on Speech and Language Technology in Education (SLaTE), 2023

A standard pipeline for automated spoken language assessment is to start with an automatic speech... more A standard pipeline for automated spoken language assessment is to start with an automatic speech recognition (ASR) system and derive features that exploit transcriptions and audio. Although efficient, these approaches require ASR systems that can be used for second language (L2) speakers and preferably tuned to the specific form of test being deployed. Recently, a self-supervised speech representation-based scheme requiring no ASR was proposed. This work extends the initial analysis to a large-scale proficiency test, Linguaskill. The performance of a self-supervised, wav2vec 2.0, system is compared to a high-performance hand-crafted assessment system and a BERTbased system, both of which use ASR transcriptions. Though the wav2vec 2.0 based system is found to be sensitive to the nature of the response, it can be configured to yield comparable performance to systems requiring transcriptions and shows significant gains when appropriately combined with standard approaches.

Grammatical Error Correction for L2 Speech Using Publicly Available Data

9th Workshop on Speech and Language Technology in Education (SLaTE)

Over the past decades, the demand for learning English as a second language (L2) has grown consis... more Over the past decades, the demand for learning English as a second language (L2) has grown consistently, as it has gradually become the lingua franca of business, culture, entertainment, and academia. This aspect has contributed to an increasing demand for systems for automatic feedback for applications in Computer-Assisted Language Learning. In this regard, mastering grammar is a key element of L2 speaking proficiency. In this paper, we illustrate an approach to spoken grammatical error correction (GEC) in a cascaded fashion using only publicly available training data. Specifically, we start from learners' utterances, investigate disfluency detection, and finally explore GEC. We test this pipeline on NICT-JLE, a publicly available L2 corpus, and TLT-GEC, a private dataset that is under preparation for release. We obtain promising results which outperform previous studies that used large proprietary datasets, and we set a potential baseline for future experiments on spoken GEC.

Proficiency assessment of L2 spoken English using wav2vec 2.0

The increasing demand for learning English as a second language has led to a growing interest in ... more The increasing demand for learning English as a second language has led to a growing interest in methods for automatically assessing spoken language proficiency. Most approaches use hand-crafted features, but their efficacy relies on their particular underlying assumptions and they risk discarding potentially salient information about proficiency. Other approaches rely on transcriptions produced by ASR systems which may not provide a faithful rendition of a learner's utterance in specific scenarios (e.g., non-native children's spontaneous speech). Furthermore, transcriptions do not yield any information about relevant aspects such as intonation, rhythm or prosody. In this paper, we investigate the use of wav2vec 2.0 for assessing overall and individual aspects of proficiency on two small datasets, one of which is publicly available. We find that this approach significantly outperforms the BERT-based baseline system trained on ASR and manual transcriptions used for comparison.

View-Specific Assessment of L2 Spoken English

Proc. Interspeech 2022, 2022

The growing demand for learning English as a second language has increased interest in automatic ... more The growing demand for learning English as a second language has increased interest in automatic approaches for assessing and improving spoken language proficiency. A significant challenge in this field is to provide interpretable scores and informative feedback to learners through individual viewpoints of learners' proficiency, as opposed to holistic scores. Thus far, holistic scoring remains commonly applied in large-scale commercial tests. As a result, an issue with more detailed evaluation is that human graders are generally trained to provide holistic scores. This paper investigates whether view-specific systems can be trained when only holistic scores are available. To enable this process, view-specific networks are defined where both their inputs and structure are adapted to focus on specific facets of proficiency. It is shown that it is possible to train such systems on holistic scores, such that they provide viewspecific scores at evaluation time. View-specific networks are designed in this way for pronunciation, rhythm, text, use of parts of speech and grammatical accuracy. The relationships between the predictions of each system are investigated on the spoken part of the Linguaskill proficiency test. It is shown that the view-specific predictions are complementary in nature and capture different information about proficiency.

Cross-corpora experiments of automatic proficiency assessment and error detection for spoken English

Proceedings of the 17th Workshop on Innovative Use of NLP for Building Educational Applications (BEA 2022), 2022

The growing demand for learning English as a second language has led to an increasing interest in... more The growing demand for learning English as a second language has led to an increasing interest in automatic approaches for assessing spoken language proficiency. One of the most significant challenges in this field is the lack of publicly available annotated spoken data. Another common issue is the lack of consistency and coherence in human assessment. To tackle both problems, in this paper we address the task of automatically predicting the scores of spoken test responses of English as-a-second-language learners by training neural models on written data and using the presence of grammatical errors as a feature, as they can be considered consistent indicators of proficiency through their distribution and frequency. Specifically, we train a feature extractor on EFCAMDAT, a large written corpus containing error annotations and proficiency levels assigned by human experts, in order to extract information related to grammatical errors and, in turn, we use the resulting model for inference on the CLC-FCE corpus, on the ICNALE corpus, and on the spoken section of the TLT-school corpus, a collection of proficiency tests taken by Italian students. The work investigates the impact of the feature extractor on spoken proficiency assessment as well as the written-to-spoken approach. We find that our error-based approach can be beneficial for assessing spoken proficiency. The results obtained on the considered datasets are discussed and evaluated with appropriate metrics.

On Assessing and Developing Spoken 'Grammatical Error Correction' Systems

Proceedings of the 17th Workshop on Innovative Use of NLP for Building Educational Applications (BEA 2022), 2022

Spoken 'grammatical error correction' (SGEC) is an important process to provide feedback for seco... more Spoken 'grammatical error correction' (SGEC) is an important process to provide feedback for second language learning. Due to a lack of end-to-end training data, SGEC is often implemented as a cascaded, modular system, consisting of speech recognition, disfluency removal, and grammatical error correction (GEC). This cascaded structure enables efficient use of training data for each module. It is, however, difficult to compare and evaluate the performance of individual modules as preceeding modules may introduce errors. For example the GEC module input depends on the output of nonnative speech recognition and disfluency detection, both challenging tasks for learner data. This paper focuses on the assessment and development of SGEC systems. We first discuss metrics for evaluating SGEC, both individual modules and the overall system. The system level metrics enable tuning for optimal system performance. A known issue in cascaded systems is error propagation between modules. To mitigate this problem semi-supervised approaches and self-distillation are investigated. Lastly, when SGEC system gets deployed it is important to give accurate feedback to users. Thus, we apply filtering to remove edits with low-confidence, aiming to improve overall feedback precision. The performance metrics are examined on a Linguaskill multi-level data set, which includes the original non-native speech, manual transcriptions and reference grammatical error corrections, to enable system analysis and development.

Wilhelm Doegen and the Königlich-Preussische Phonographische Kommission

Towards error-based strategies for automatically assessing ESL learners' proficiency

Collated Papers for the 7th ALTE International Conference, 2021

In this paper we propose potential strategies for automatically assessing second language profici... more In this paper we propose potential strategies for automatically assessing second language proficiency based on the presence of errors only. We used an open-source grammar and spelling check tool to extract errors from the answers of the written section of an Italian English as a second language (ESL) learners' corpus annotated with human scores and we automatically generated the respective correct versions. We found a moderate correlation between the presence of errors and the scores assigned by human experts. As such, we believe that error-rate may be particularly suitable for automatic assessment tools. Therefore, we envisage the use of various state-of-the-art machine learning approaches, aiming at developing useful techniques for both ESL learners and teachers.

TLT-school: a Corpus of Non Native Children Speech

Proceedings of the 12th Conference on Language Resources and Evaluation (LREC 2020), 2020

This paper describes "TLT-school", a corpus of speech utterances collected in schools of northern... more This paper describes "TLT-school", a corpus of speech utterances collected in schools of northern Italy for assessing the performance of students learning both English and German. The corpus was recorded in the years 2017 and 2018 from students aged between nine and sixteen years, attending primary, middle and high school. All utterances have been scored, in terms of some predefined proficiency indicators, by human experts. In addition, most of utterances recorded in 2017 have been manually transcribed carefully. Guidelines and procedures used for manual transcriptions of utterances will be described in detail, as well as results achieved by means of an automatic speech recognition system developed by us. Part of the corpus is going to be freely distributed to scientific community particularly interested both in non-native speech recognition and automatic assessment of second language proficiency.

«Si sonus cadit, tota scientia vadit»: Friedrich Schürr alle prese con il vocalismo nel dialetto di Nimis

Quaderni di filologia romanza, 2018

ITALIANO Durante il primo conflitto mondiale Friedrich Schürr divenne uno dei componenti della ... more ITALIANO

Durante il primo conflitto mondiale Friedrich Schürr divenne uno dei componenti della sezione romanza della Königlich-Preussische Phonographische Kommission. Il 17 e il 18 giugno del 1918 visitò il campo di prigionia di Hammelburg in Baviera, dove produsse le registrazioni di quattro soldati italiani. È risaputo l’interesse del linguista austriaco nei confronti delle varietà dialettali emiliane e romagnole. Non a caso Schürr decise di registrare le voci di due prigionieri emiliani e di un romagnolo. Il quarto informatore era invece un soldato friulano, originario di Nimis (UD). Dopo aver effettuato una prima trascrizione fonetica della registrazione friulana quando si trovava ancora a Hammelburg, Schürr ne stilò una seconda, che pubblicò in un articolo del 1930. Un anno dopo Ugo Pellis criticò pesantemente lo studio del collega austriaco. Tuttavia il linguista italiano non era a conoscenza dell’esistenza della prima trascrizione fonetica né aveva potuto ascoltare la relativa registrazione grammofonica. Nel presente contributo è stata confrontata e analizzata l’intera documentazione relativa alla registrazione in questione.

ENGLISH

During the First World War, Friedrich Schürr became one of the members of the Romance section of the Königlich-Preussische Phonographische Kommission. On 17th and 18th June 1918 he visited the Hammelburg POW camp, where he produced the recordings of four Italian soldiers. The interest of the Austrian linguist in the dialectal varieties of Emilia-Romagna is well known. Therefore, it is no coincidence that Schürr decided to record the voices of two Emilian prisoners and one from Romagna. The fourth informant was instead a Friulian soldier from Nimis (UD). After writing a phonetic transcription of the Friulian recording when he was still in Hammelburg, Schürr drafted a second one, which he published in 1930 article. A year later, Ugo Pellis heavily criticised Schürr’s study. However, the Italian linguist was not aware of the existence of the first phonetic transcription nor could he have listened to the respective gramophone recording. In this contribution the entire documentation concerning the recording in question was compared and analysed.

Voci vive. Riaffiorano dagli archivi le prime registrazioni di voci dialettali italiane.

Studi Trentini. Storia, 2017

Riaffiorano dagli archivi le prime registrazioni di voci dialettali italiane STEFANO BANNÒ o scor... more Riaffiorano dagli archivi le prime registrazioni di voci dialettali italiane STEFANO BANNÒ o scorso 10 febbraio si è svolto presso la sede dell'Accademia della Crusca il primo convegno su "Voci della Grande Guerra", un progetto nato dalla collaborazione tra l'culturale si pone l'obiettivo di conservare e diffondere la memoria della Grande Guerra attraverso la costituzione di un grande archivio digitale di testi, materiali fotografici e sonori rappresentativi dell'Italia e degli italiani durante il primo conflitto mondiale.

Voci e scritture di prigionieri italiani della Prima guerra mondiale

RID - Rivista di Dialettologia Italiana, 2018

Il presente contributo vuole offrire una presentazione dei materiali scritti e sonori della sezio... more Il presente contributo vuole offrire una presentazione dei materiali scritti e sonori della sezione italiana del Lautarchiv della Humboldt Universität di Berlino, formata da registrazioni fonografiche e testi dialettali di prigionieri italiani prodotti in diversi Lager tedeschi della Grande Guerra e accompagnati dalle rispettive trascrizioni fonetiche dei linguisti coinvolti nelle inchieste della Königlich-Preußische Phonographische Kommission. In particolare, si concentra sui sistemi di elicitazione linguistica impiegati sul campo e sulle risposte date dagli intervistati a tali stimoli.

The present contribution means to show a presentation of the written materials and the sound recordings of the Italian section of the Lautarchiv of Humboldt University in Berlin, made up of sound recordings and dialectal texts of Italian prisoners produced in several German POW camps during the Great War and accompanied by the respective phonetic translations written by the linguists involve
in the surveys of the Königlich-Preußische Phonographische Kommission. In particular, it focuses on the linguistic elicitation systems employed in fieldwork and on the responses given by the interviewees to such stimuli.

Talks by Stefano Bannò

Languages and the First World War

Le voci ritrovate: le registrazioni dei prigionieri italiani della Grande Guerra dagli archivi sonori di Berlino

A 100 anni dalla fine della Prima Guerra Mondiale si ritrovano a Udine (allora sede del comando i... more A 100 anni dalla fine della Prima Guerra Mondiale si ritrovano a Udine (allora sede del comando italiano al fronte) studiosi tedeschi, austriaci e italiani a presentare e commentare un corpus di registrazioni sonore e materiali documentari che riguardano militari italiani detenuti nei campi di prigionia tedeschi durante la Grande Guerra. Realizzato nel 1918 da una équipe composta da linguisti, musicologi ed etnologi ha raccolto le voci di decine di militari italiani provenienti da quasi tutte le regioni italiane. Il risultato è uno spaccato eccezionale su un campione di popolo italiano in fatto di lingua, cultura, alfabetizzazione e tradizioni che verranno investigate sotto i diversi possibili punti di vista.

La Grande Guerre des gens «ordinaires»: correspondances, récits, témoignages

Thesis Chapters by Stefano Bannò

Un corpus inedito. Le registrazioni fonografiche di parlati dialettali italiani nei campi di prigionia tedeschi della Grande Guerra. Scritture di prigionieri e trascrizioni di linguisti al Lautarchiv di Berlino.

Tra il 1º novembre del 2016 e il 1º febbraio del 2017, grazie a una borsa di studio della Facoltà... more Tra il 1º novembre del 2016 e il 1º febbraio del 2017, grazie a una borsa di studio della Facoltà di Lettere e Filosofia dell’Università degli Studi di Trento, ho svolto attività di ricerca presso il Lautarchiv della Humboldt Universität di Berlino, un archivio che conserva 4503 registrazioni fonografiche di racconti, poesie, monologhi e canti in più di 250 lingue e dialetti di tutto il mondo. Tra gli enormi schedari della piccola stanza al primo piano dell’Institut für Musikwissenschaft und Medienwissenschaft al numero 5 di Kupfergraben, si trovano 64 dischi in gommalacca contenenti 115 registrazioni dialettali italiane inedite, incise in diversi campi di prigionia tedeschi durante la Grande Guerra dalla Königlich-Preussische Phonographische Kommission, una commissione nata nell’ottobre del 1915 e composta da linguisti, antropologi e etnomusicologi tedeschi, austriaci e svizzeri, i quali avevano visto nella guerra un’opportunità unica e irripetibile: le culture e le lingue di popoli vicini e – soprattutto – lontani ed esotici, spesso a rischio di estinzione, potevano essere studiate comparativamente senza intraprendere faticose e costose spedizioni in giro per l’Europa e nei territori coloniali.
La documentazione scritta, in forma di rigorosi verbali, che accompagna le registrazioni dell’archivio testimonia il livello scientifico della raccolta di materiali linguistici dal parlato. Alla quasi totalità delle incisioni della sezione di cui mi sono occupato sono infatti allegate le trascrizioni fonetiche stilate dai linguisti che svolgevano le loro ricerche nei campi di prigionia (la maggior parte delle sessioni di registrazione fu seguita dal romanista Hermann Urtel, ma alcune vennero affidate ai più celebri Max Leopold Wagner e Friedrich Schürr); possediamo però anche le schede personali dei prigionieri intervistati e i testi delle registrazioni redatti dagli stessi prigionieri.
Un elemento di assoluta novità risiede nella possibilità di leggere ed analizzare documenti dialettali scritti da non linguisti, dialettofoni, che l’analisi delle deviazioni grafiche dalla norma (a livello di segmentazione, impiego di segni diacritici, accenti, maiuscole e punteggiatura) denunciano per utenti dell’italiano popolare, dunque portatori, nella maggior parte dei casi, di una scolarizzazione medio-bassa. Nei testi dei prigionieri, costretti a scrivere nei propri rispettivi dialetti – non in italiano –, emergono, a seconda dei diversi casi, condizionamenti delle scriptae regionali, influssi esercitati dalla “pressione” delle norme ortografiche italiane, elementi che fanno supporre una competenza almeno passiva dell’italiano letterario – tra tutti i brani a noi pervenuti, spicca l’unica ma preziosissima versione dialettale della Novella IX della della Giornata I del Decameron – e talvolta le possibili interferenze grafiche di lingue straniere.
La rara opportunità di poter confrontare i testi dialettali con le rispettive registrazioni fonografiche inoltre ci permette di descrivere fasi ancora mai documentate fonicamente di dialetti italiani. Unitamente all’analisi di tali fonti, scritte e sonore, lo studio dei testi in trascrizione fonetica ci offre un ulteriore termine di confronto di grande rilevanza.
La documentazione scritta del Lautarchiv, infatti, oltre a fornirci la possibilità di analizzare la coscienza e la competenza dialettali scritte e orali dei prigionieri, ci permette di osservare in modo assai privilegiato il lavoro dei linguisti attivi nella ricerca sul campo. Le trascrizioni fonetiche firmate da Urtel e Schürr – quelle di Wagner non ci sono pervenute – costituiscono un’importante testimonianza del loro modus operandi durante le sessioni di registrazione nei lager tedeschi. Nella maggior parte dei casi, dal confronto fra la registrazione sonora, il testo dialettale e la trascrizione fonetica appare evidente che entrambi i linguisti tendono a realizzare delle trascrizioni fonetiche più aderenti ai testi scritti e, di conseguenza, ai modelli utilizzati come sistemi di elicitazione – in particolare, la parabola del figliol prodigo, la novella del Decameron e i Normalsätze –, piuttosto che al contenuto delle registrazioni sonore, preferendo in ultima analisi una trascrizione chiara, leggibile e funzionale ad una trascrizione totalmente fedele.
L’analisi dei testi in trascrizione fonetica, inoltre, ci permette di considerare e valutare alcune scelte grafiche dei linguisti nell’utilizzo dell’alfabeto dell’Association Phonétique Internationale in una delle sue prime applicazioni in ambito dialettologico romanzo. Alcuni suoni, all’epoca privi di simboli corrispondenti in tale metodo di notazione fonetica, sono stati rappresentati attraverso combinazioni di simboli o segni grafici mutuati da altri sistemi di trascrizione.
È interessante ribadire l’importanza dei sistemi di elicitazione impiegati nelle indagini sul campo. Urtel decise di utilizzare due dei metodi “classici” della linguistica dell’Ottocento e dei primi del Novecento – la parabola del figliol prodigo e la Novella IX della Giornata I del Decameron –, mentre Schürr impiegò i suoi Normalsätze, modellati su un altro strumento della linguistica di fine Ottocento, i Normalsätze che Wenker utilizzò per il suo Sprachatlas des deutschen Reichs. La presenza di diversi elementi riconducibili ai testi originali (vere e proprie copiature o dialettizzazioni di parole, sintagmi e segni d’interpunzione) suggerisce che i prigionieri effettuassero le traduzioni direttamente a partire dalla lettura dei testi scritti, apparentemente senza la mediazione del linguista, il quale talvolta interveniva con delle correzioni soltanto in seguito sulla redazione delle versioni dialettali. Diversamente, la performance di poesie, filastrocche e canzoni era eseguita dai prigionieri verosimilmente a memoria, data la maggior scorrevolezza e naturalezza nella recitazione decisamente evidenti nelle registrazioni sonore.
Dello studio dialettologico stricto sensu si è dato solo un saggio, con l’auspicio che ogni registrazione sia in futuro oggetto di analisi approfondita da parte di dialettologi competenti nelle diverse varietà dialettali incise nei 64 dischi che sono stati presentati.
Degli aspetti etnomusicologici del materiale si sta occupando il prof. Ignazio Macchiarella dell’Università di Cagliari, il quale a breve pubblicherà i risultati delle sue ricerche accompagnati dall’edizione del materiale sonoro in CD-ROM.
Contemporaneamente la prof.ssa Serenella Baggio si sta occupando dell’edizione dei verbali delle registrazioni austriache di Hans Ettmayer per l’Accademia delle Scienze di Vienna.
Un appello che ho già avuto modo di rivolgere in occasione del primo convegno sul progetto “Voci della Grande Guerra” tenutosi all’Accademia della Crusca il 10 febbraio scorso consiste nella proposta di acquisizione dei materiali della sezione italiana del Lautarchiv e delle registrazioni italiane conservate negli altri archivi fonografici di Berlino, Vienna e Zurigo, già digitalizzate o in via d’esserlo. Il progetto, nato dalla collaborazione tra l’Università di Pisa, l’Istituto di Linguistica Computazionale del CNR di Pisa, l’Università di Siena e l’Accademia della Crusca, si pone l’obiettivo di conservare e diffondere la memoria della Grande Guerra attraverso la costituzione di un grande archivio digitale di testi, materiali fotografici e sonori rappresentativi dell’Italia e degli italiani durante il primo conflitto mondiale entro la metà del 2018.
La riscoperta di voci dialettali italiane, per lungo tempo rimaste inascoltate e dimenticate, le prime veramente ascoltabili a nostra disposizione, costituisce una nuova fonte di studio per diversi ambiti scientifici, dalla dialettologia alla fonetica, dalla sociolinguistica all’antropologia, dai cultural studies agli studi storici, a maggior ragione adesso, a distanza di cento anni da quell’«inutile strage» che sconvolse l’Europa.

Drafts by Stefano Bannò

L2 proficiency assessment using self-supervised speech representations

There has been a growing demand for automated spoken language assessment systems in recent years.... more There has been a growing demand for automated spoken language assessment systems in recent years. A standard pipeline for this process is to start with a speech recognition system and derive features, either hand-crafted or based on deep-learning, that exploit the transcription and audio. Though these approaches can yield high performance systems, they require speech recognition systems that can be used for L2 speakers, and preferably tuned to the specific form of test being deployed. Recently a self-supervised speech representation based scheme, requiring no speech recognition, was proposed. This work extends the initial analysis conducted on this approach to a large scale proficiency test, Linguaskill, that comprises multiple parts, each designed to assess different attributes of a candidate's speaking proficiency. The performance of the self-supervised, wav2vec 2.0, system is compared to a high performance hand-crafted assessment system and a BERT-based text system both of which use speech transcriptions. Though the wav2vec 2.0 based system is found to be sensitive to the nature of the response, it can be configured to yield comparable performance to systems requiring a speech transcription, and yields gains when appropriately combined with standard approaches.

Automatic Assessment of Conversational Speaking Tests

Proc. 9th Workshop on Speech and Language Technology in Education (SLaTE), 2023

Many speaking tests are conversational, dialogic, in form with an interlocutor talking to one or ... more Many speaking tests are conversational, dialogic, in form with an interlocutor talking to one or more candidates. This paper investigates how to automatically assess such a test. State-of-theart approaches are used in a multi-stage pipeline: diarization and speaker assignment, to detect who is speaking and when; automatic speech recognition (ASR), to produce a transcript; and finally assessment. Each presents challenges which are investigated in the paper. Advanced foundation model-based auto-markers are examined: an ensemble of Longformer-based models that operates on the ASR output text; and a wav2vec2based system that works directly on the audio. The two are combined to yield the final score. This fully automated system is evaluated in terms of ASR performance, and related impact of candidate assignment, as well as prediction of the candidate mark on data from the Occupational English Test. This is a conversational speaking test for L2 English healthcare professionals.

Assessment of L2 Oral Proficiency Using Self-Supervised Speech Representation Learning

Proc. 9th Workshop on Speech and Language Technology in Education (SLaTE), 2023

A standard pipeline for automated spoken language assessment is to start with an automatic speech... more A standard pipeline for automated spoken language assessment is to start with an automatic speech recognition (ASR) system and derive features that exploit transcriptions and audio. Although efficient, these approaches require ASR systems that can be used for second language (L2) speakers and preferably tuned to the specific form of test being deployed. Recently, a self-supervised speech representation-based scheme requiring no ASR was proposed. This work extends the initial analysis to a large-scale proficiency test, Linguaskill. The performance of a self-supervised, wav2vec 2.0, system is compared to a high-performance hand-crafted assessment system and a BERTbased system, both of which use ASR transcriptions. Though the wav2vec 2.0 based system is found to be sensitive to the nature of the response, it can be configured to yield comparable performance to systems requiring transcriptions and shows significant gains when appropriately combined with standard approaches.

Grammatical Error Correction for L2 Speech Using Publicly Available Data

9th Workshop on Speech and Language Technology in Education (SLaTE)

Over the past decades, the demand for learning English as a second language (L2) has grown consis... more Over the past decades, the demand for learning English as a second language (L2) has grown consistently, as it has gradually become the lingua franca of business, culture, entertainment, and academia. This aspect has contributed to an increasing demand for systems for automatic feedback for applications in Computer-Assisted Language Learning. In this regard, mastering grammar is a key element of L2 speaking proficiency. In this paper, we illustrate an approach to spoken grammatical error correction (GEC) in a cascaded fashion using only publicly available training data. Specifically, we start from learners' utterances, investigate disfluency detection, and finally explore GEC. We test this pipeline on NICT-JLE, a publicly available L2 corpus, and TLT-GEC, a private dataset that is under preparation for release. We obtain promising results which outperform previous studies that used large proprietary datasets, and we set a potential baseline for future experiments on spoken GEC.

Proficiency assessment of L2 spoken English using wav2vec 2.0

The increasing demand for learning English as a second language has led to a growing interest in ... more The increasing demand for learning English as a second language has led to a growing interest in methods for automatically assessing spoken language proficiency. Most approaches use hand-crafted features, but their efficacy relies on their particular underlying assumptions and they risk discarding potentially salient information about proficiency. Other approaches rely on transcriptions produced by ASR systems which may not provide a faithful rendition of a learner's utterance in specific scenarios (e.g., non-native children's spontaneous speech). Furthermore, transcriptions do not yield any information about relevant aspects such as intonation, rhythm or prosody. In this paper, we investigate the use of wav2vec 2.0 for assessing overall and individual aspects of proficiency on two small datasets, one of which is publicly available. We find that this approach significantly outperforms the BERT-based baseline system trained on ASR and manual transcriptions used for comparison.

View-Specific Assessment of L2 Spoken English

Proc. Interspeech 2022, 2022

The growing demand for learning English as a second language has increased interest in automatic ... more The growing demand for learning English as a second language has increased interest in automatic approaches for assessing and improving spoken language proficiency. A significant challenge in this field is to provide interpretable scores and informative feedback to learners through individual viewpoints of learners' proficiency, as opposed to holistic scores. Thus far, holistic scoring remains commonly applied in large-scale commercial tests. As a result, an issue with more detailed evaluation is that human graders are generally trained to provide holistic scores. This paper investigates whether view-specific systems can be trained when only holistic scores are available. To enable this process, view-specific networks are defined where both their inputs and structure are adapted to focus on specific facets of proficiency. It is shown that it is possible to train such systems on holistic scores, such that they provide viewspecific scores at evaluation time. View-specific networks are designed in this way for pronunciation, rhythm, text, use of parts of speech and grammatical accuracy. The relationships between the predictions of each system are investigated on the spoken part of the Linguaskill proficiency test. It is shown that the view-specific predictions are complementary in nature and capture different information about proficiency.

Cross-corpora experiments of automatic proficiency assessment and error detection for spoken English

Proceedings of the 17th Workshop on Innovative Use of NLP for Building Educational Applications (BEA 2022), 2022

The growing demand for learning English as a second language has led to an increasing interest in... more The growing demand for learning English as a second language has led to an increasing interest in automatic approaches for assessing spoken language proficiency. One of the most significant challenges in this field is the lack of publicly available annotated spoken data. Another common issue is the lack of consistency and coherence in human assessment. To tackle both problems, in this paper we address the task of automatically predicting the scores of spoken test responses of English as-a-second-language learners by training neural models on written data and using the presence of grammatical errors as a feature, as they can be considered consistent indicators of proficiency through their distribution and frequency. Specifically, we train a feature extractor on EFCAMDAT, a large written corpus containing error annotations and proficiency levels assigned by human experts, in order to extract information related to grammatical errors and, in turn, we use the resulting model for inference on the CLC-FCE corpus, on the ICNALE corpus, and on the spoken section of the TLT-school corpus, a collection of proficiency tests taken by Italian students. The work investigates the impact of the feature extractor on spoken proficiency assessment as well as the written-to-spoken approach. We find that our error-based approach can be beneficial for assessing spoken proficiency. The results obtained on the considered datasets are discussed and evaluated with appropriate metrics.

On Assessing and Developing Spoken 'Grammatical Error Correction' Systems

Proceedings of the 17th Workshop on Innovative Use of NLP for Building Educational Applications (BEA 2022), 2022

Spoken 'grammatical error correction' (SGEC) is an important process to provide feedback for seco... more Spoken 'grammatical error correction' (SGEC) is an important process to provide feedback for second language learning. Due to a lack of end-to-end training data, SGEC is often implemented as a cascaded, modular system, consisting of speech recognition, disfluency removal, and grammatical error correction (GEC). This cascaded structure enables efficient use of training data for each module. It is, however, difficult to compare and evaluate the performance of individual modules as preceeding modules may introduce errors. For example the GEC module input depends on the output of nonnative speech recognition and disfluency detection, both challenging tasks for learner data. This paper focuses on the assessment and development of SGEC systems. We first discuss metrics for evaluating SGEC, both individual modules and the overall system. The system level metrics enable tuning for optimal system performance. A known issue in cascaded systems is error propagation between modules. To mitigate this problem semi-supervised approaches and self-distillation are investigated. Lastly, when SGEC system gets deployed it is important to give accurate feedback to users. Thus, we apply filtering to remove edits with low-confidence, aiming to improve overall feedback precision. The performance metrics are examined on a Linguaskill multi-level data set, which includes the original non-native speech, manual transcriptions and reference grammatical error corrections, to enable system analysis and development.

Wilhelm Doegen and the Königlich-Preussische Phonographische Kommission

Towards error-based strategies for automatically assessing ESL learners' proficiency

Collated Papers for the 7th ALTE International Conference, 2021

In this paper we propose potential strategies for automatically assessing second language profici... more In this paper we propose potential strategies for automatically assessing second language proficiency based on the presence of errors only. We used an open-source grammar and spelling check tool to extract errors from the answers of the written section of an Italian English as a second language (ESL) learners' corpus annotated with human scores and we automatically generated the respective correct versions. We found a moderate correlation between the presence of errors and the scores assigned by human experts. As such, we believe that error-rate may be particularly suitable for automatic assessment tools. Therefore, we envisage the use of various state-of-the-art machine learning approaches, aiming at developing useful techniques for both ESL learners and teachers.

TLT-school: a Corpus of Non Native Children Speech

Proceedings of the 12th Conference on Language Resources and Evaluation (LREC 2020), 2020

This paper describes "TLT-school", a corpus of speech utterances collected in schools of northern... more This paper describes "TLT-school", a corpus of speech utterances collected in schools of northern Italy for assessing the performance of students learning both English and German. The corpus was recorded in the years 2017 and 2018 from students aged between nine and sixteen years, attending primary, middle and high school. All utterances have been scored, in terms of some predefined proficiency indicators, by human experts. In addition, most of utterances recorded in 2017 have been manually transcribed carefully. Guidelines and procedures used for manual transcriptions of utterances will be described in detail, as well as results achieved by means of an automatic speech recognition system developed by us. Part of the corpus is going to be freely distributed to scientific community particularly interested both in non-native speech recognition and automatic assessment of second language proficiency.

«Si sonus cadit, tota scientia vadit»: Friedrich Schürr alle prese con il vocalismo nel dialetto di Nimis

Quaderni di filologia romanza, 2018

ITALIANO Durante il primo conflitto mondiale Friedrich Schürr divenne uno dei componenti della ... more ITALIANO

Durante il primo conflitto mondiale Friedrich Schürr divenne uno dei componenti della sezione romanza della Königlich-Preussische Phonographische Kommission. Il 17 e il 18 giugno del 1918 visitò il campo di prigionia di Hammelburg in Baviera, dove produsse le registrazioni di quattro soldati italiani. È risaputo l’interesse del linguista austriaco nei confronti delle varietà dialettali emiliane e romagnole. Non a caso Schürr decise di registrare le voci di due prigionieri emiliani e di un romagnolo. Il quarto informatore era invece un soldato friulano, originario di Nimis (UD). Dopo aver effettuato una prima trascrizione fonetica della registrazione friulana quando si trovava ancora a Hammelburg, Schürr ne stilò una seconda, che pubblicò in un articolo del 1930. Un anno dopo Ugo Pellis criticò pesantemente lo studio del collega austriaco. Tuttavia il linguista italiano non era a conoscenza dell’esistenza della prima trascrizione fonetica né aveva potuto ascoltare la relativa registrazione grammofonica. Nel presente contributo è stata confrontata e analizzata l’intera documentazione relativa alla registrazione in questione.

ENGLISH

During the First World War, Friedrich Schürr became one of the members of the Romance section of the Königlich-Preussische Phonographische Kommission. On 17th and 18th June 1918 he visited the Hammelburg POW camp, where he produced the recordings of four Italian soldiers. The interest of the Austrian linguist in the dialectal varieties of Emilia-Romagna is well known. Therefore, it is no coincidence that Schürr decided to record the voices of two Emilian prisoners and one from Romagna. The fourth informant was instead a Friulian soldier from Nimis (UD). After writing a phonetic transcription of the Friulian recording when he was still in Hammelburg, Schürr drafted a second one, which he published in 1930 article. A year later, Ugo Pellis heavily criticised Schürr’s study. However, the Italian linguist was not aware of the existence of the first phonetic transcription nor could he have listened to the respective gramophone recording. In this contribution the entire documentation concerning the recording in question was compared and analysed.

Voci vive. Riaffiorano dagli archivi le prime registrazioni di voci dialettali italiane.

Studi Trentini. Storia, 2017

Riaffiorano dagli archivi le prime registrazioni di voci dialettali italiane STEFANO BANNÒ o scor... more Riaffiorano dagli archivi le prime registrazioni di voci dialettali italiane STEFANO BANNÒ o scorso 10 febbraio si è svolto presso la sede dell'Accademia della Crusca il primo convegno su "Voci della Grande Guerra", un progetto nato dalla collaborazione tra l'culturale si pone l'obiettivo di conservare e diffondere la memoria della Grande Guerra attraverso la costituzione di un grande archivio digitale di testi, materiali fotografici e sonori rappresentativi dell'Italia e degli italiani durante il primo conflitto mondiale.

Voci e scritture di prigionieri italiani della Prima guerra mondiale

RID - Rivista di Dialettologia Italiana, 2018

Il presente contributo vuole offrire una presentazione dei materiali scritti e sonori della sezio... more Il presente contributo vuole offrire una presentazione dei materiali scritti e sonori della sezione italiana del Lautarchiv della Humboldt Universität di Berlino, formata da registrazioni fonografiche e testi dialettali di prigionieri italiani prodotti in diversi Lager tedeschi della Grande Guerra e accompagnati dalle rispettive trascrizioni fonetiche dei linguisti coinvolti nelle inchieste della Königlich-Preußische Phonographische Kommission. In particolare, si concentra sui sistemi di elicitazione linguistica impiegati sul campo e sulle risposte date dagli intervistati a tali stimoli.

The present contribution means to show a presentation of the written materials and the sound recordings of the Italian section of the Lautarchiv of Humboldt University in Berlin, made up of sound recordings and dialectal texts of Italian prisoners produced in several German POW camps during the Great War and accompanied by the respective phonetic translations written by the linguists involve
in the surveys of the Königlich-Preußische Phonographische Kommission. In particular, it focuses on the linguistic elicitation systems employed in fieldwork and on the responses given by the interviewees to such stimuli.

Languages and the First World War

Le voci ritrovate: le registrazioni dei prigionieri italiani della Grande Guerra dagli archivi sonori di Berlino

A 100 anni dalla fine della Prima Guerra Mondiale si ritrovano a Udine (allora sede del comando i... more A 100 anni dalla fine della Prima Guerra Mondiale si ritrovano a Udine (allora sede del comando italiano al fronte) studiosi tedeschi, austriaci e italiani a presentare e commentare un corpus di registrazioni sonore e materiali documentari che riguardano militari italiani detenuti nei campi di prigionia tedeschi durante la Grande Guerra. Realizzato nel 1918 da una équipe composta da linguisti, musicologi ed etnologi ha raccolto le voci di decine di militari italiani provenienti da quasi tutte le regioni italiane. Il risultato è uno spaccato eccezionale su un campione di popolo italiano in fatto di lingua, cultura, alfabetizzazione e tradizioni che verranno investigate sotto i diversi possibili punti di vista.

La Grande Guerre des gens «ordinaires»: correspondances, récits, témoignages

Un corpus inedito. Le registrazioni fonografiche di parlati dialettali italiani nei campi di prigionia tedeschi della Grande Guerra. Scritture di prigionieri e trascrizioni di linguisti al Lautarchiv di Berlino.

Tra il 1º novembre del 2016 e il 1º febbraio del 2017, grazie a una borsa di studio della Facoltà... more Tra il 1º novembre del 2016 e il 1º febbraio del 2017, grazie a una borsa di studio della Facoltà di Lettere e Filosofia dell’Università degli Studi di Trento, ho svolto attività di ricerca presso il Lautarchiv della Humboldt Universität di Berlino, un archivio che conserva 4503 registrazioni fonografiche di racconti, poesie, monologhi e canti in più di 250 lingue e dialetti di tutto il mondo. Tra gli enormi schedari della piccola stanza al primo piano dell’Institut für Musikwissenschaft und Medienwissenschaft al numero 5 di Kupfergraben, si trovano 64 dischi in gommalacca contenenti 115 registrazioni dialettali italiane inedite, incise in diversi campi di prigionia tedeschi durante la Grande Guerra dalla Königlich-Preussische Phonographische Kommission, una commissione nata nell’ottobre del 1915 e composta da linguisti, antropologi e etnomusicologi tedeschi, austriaci e svizzeri, i quali avevano visto nella guerra un’opportunità unica e irripetibile: le culture e le lingue di popoli vicini e – soprattutto – lontani ed esotici, spesso a rischio di estinzione, potevano essere studiate comparativamente senza intraprendere faticose e costose spedizioni in giro per l’Europa e nei territori coloniali.
La documentazione scritta, in forma di rigorosi verbali, che accompagna le registrazioni dell’archivio testimonia il livello scientifico della raccolta di materiali linguistici dal parlato. Alla quasi totalità delle incisioni della sezione di cui mi sono occupato sono infatti allegate le trascrizioni fonetiche stilate dai linguisti che svolgevano le loro ricerche nei campi di prigionia (la maggior parte delle sessioni di registrazione fu seguita dal romanista Hermann Urtel, ma alcune vennero affidate ai più celebri Max Leopold Wagner e Friedrich Schürr); possediamo però anche le schede personali dei prigionieri intervistati e i testi delle registrazioni redatti dagli stessi prigionieri.
Un elemento di assoluta novità risiede nella possibilità di leggere ed analizzare documenti dialettali scritti da non linguisti, dialettofoni, che l’analisi delle deviazioni grafiche dalla norma (a livello di segmentazione, impiego di segni diacritici, accenti, maiuscole e punteggiatura) denunciano per utenti dell’italiano popolare, dunque portatori, nella maggior parte dei casi, di una scolarizzazione medio-bassa. Nei testi dei prigionieri, costretti a scrivere nei propri rispettivi dialetti – non in italiano –, emergono, a seconda dei diversi casi, condizionamenti delle scriptae regionali, influssi esercitati dalla “pressione” delle norme ortografiche italiane, elementi che fanno supporre una competenza almeno passiva dell’italiano letterario – tra tutti i brani a noi pervenuti, spicca l’unica ma preziosissima versione dialettale della Novella IX della della Giornata I del Decameron – e talvolta le possibili interferenze grafiche di lingue straniere.
La rara opportunità di poter confrontare i testi dialettali con le rispettive registrazioni fonografiche inoltre ci permette di descrivere fasi ancora mai documentate fonicamente di dialetti italiani. Unitamente all’analisi di tali fonti, scritte e sonore, lo studio dei testi in trascrizione fonetica ci offre un ulteriore termine di confronto di grande rilevanza.
La documentazione scritta del Lautarchiv, infatti, oltre a fornirci la possibilità di analizzare la coscienza e la competenza dialettali scritte e orali dei prigionieri, ci permette di osservare in modo assai privilegiato il lavoro dei linguisti attivi nella ricerca sul campo. Le trascrizioni fonetiche firmate da Urtel e Schürr – quelle di Wagner non ci sono pervenute – costituiscono un’importante testimonianza del loro modus operandi durante le sessioni di registrazione nei lager tedeschi. Nella maggior parte dei casi, dal confronto fra la registrazione sonora, il testo dialettale e la trascrizione fonetica appare evidente che entrambi i linguisti tendono a realizzare delle trascrizioni fonetiche più aderenti ai testi scritti e, di conseguenza, ai modelli utilizzati come sistemi di elicitazione – in particolare, la parabola del figliol prodigo, la novella del Decameron e i Normalsätze –, piuttosto che al contenuto delle registrazioni sonore, preferendo in ultima analisi una trascrizione chiara, leggibile e funzionale ad una trascrizione totalmente fedele.
L’analisi dei testi in trascrizione fonetica, inoltre, ci permette di considerare e valutare alcune scelte grafiche dei linguisti nell’utilizzo dell’alfabeto dell’Association Phonétique Internationale in una delle sue prime applicazioni in ambito dialettologico romanzo. Alcuni suoni, all’epoca privi di simboli corrispondenti in tale metodo di notazione fonetica, sono stati rappresentati attraverso combinazioni di simboli o segni grafici mutuati da altri sistemi di trascrizione.
È interessante ribadire l’importanza dei sistemi di elicitazione impiegati nelle indagini sul campo. Urtel decise di utilizzare due dei metodi “classici” della linguistica dell’Ottocento e dei primi del Novecento – la parabola del figliol prodigo e la Novella IX della Giornata I del Decameron –, mentre Schürr impiegò i suoi Normalsätze, modellati su un altro strumento della linguistica di fine Ottocento, i Normalsätze che Wenker utilizzò per il suo Sprachatlas des deutschen Reichs. La presenza di diversi elementi riconducibili ai testi originali (vere e proprie copiature o dialettizzazioni di parole, sintagmi e segni d’interpunzione) suggerisce che i prigionieri effettuassero le traduzioni direttamente a partire dalla lettura dei testi scritti, apparentemente senza la mediazione del linguista, il quale talvolta interveniva con delle correzioni soltanto in seguito sulla redazione delle versioni dialettali. Diversamente, la performance di poesie, filastrocche e canzoni era eseguita dai prigionieri verosimilmente a memoria, data la maggior scorrevolezza e naturalezza nella recitazione decisamente evidenti nelle registrazioni sonore.
Dello studio dialettologico stricto sensu si è dato solo un saggio, con l’auspicio che ogni registrazione sia in futuro oggetto di analisi approfondita da parte di dialettologi competenti nelle diverse varietà dialettali incise nei 64 dischi che sono stati presentati.
Degli aspetti etnomusicologici del materiale si sta occupando il prof. Ignazio Macchiarella dell’Università di Cagliari, il quale a breve pubblicherà i risultati delle sue ricerche accompagnati dall’edizione del materiale sonoro in CD-ROM.
Contemporaneamente la prof.ssa Serenella Baggio si sta occupando dell’edizione dei verbali delle registrazioni austriache di Hans Ettmayer per l’Accademia delle Scienze di Vienna.
Un appello che ho già avuto modo di rivolgere in occasione del primo convegno sul progetto “Voci della Grande Guerra” tenutosi all’Accademia della Crusca il 10 febbraio scorso consiste nella proposta di acquisizione dei materiali della sezione italiana del Lautarchiv e delle registrazioni italiane conservate negli altri archivi fonografici di Berlino, Vienna e Zurigo, già digitalizzate o in via d’esserlo. Il progetto, nato dalla collaborazione tra l’Università di Pisa, l’Istituto di Linguistica Computazionale del CNR di Pisa, l’Università di Siena e l’Accademia della Crusca, si pone l’obiettivo di conservare e diffondere la memoria della Grande Guerra attraverso la costituzione di un grande archivio digitale di testi, materiali fotografici e sonori rappresentativi dell’Italia e degli italiani durante il primo conflitto mondiale entro la metà del 2018.
La riscoperta di voci dialettali italiane, per lungo tempo rimaste inascoltate e dimenticate, le prime veramente ascoltabili a nostra disposizione, costituisce una nuova fonte di studio per diversi ambiti scientifici, dalla dialettologia alla fonetica, dalla sociolinguistica all’antropologia, dai cultural studies agli studi storici, a maggior ragione adesso, a distanza di cento anni da quell’«inutile strage» che sconvolse l’Europa.

L2 proficiency assessment using self-supervised speech representations

There has been a growing demand for automated spoken language assessment systems in recent years.... more There has been a growing demand for automated spoken language assessment systems in recent years. A standard pipeline for this process is to start with a speech recognition system and derive features, either hand-crafted or based on deep-learning, that exploit the transcription and audio. Though these approaches can yield high performance systems, they require speech recognition systems that can be used for L2 speakers, and preferably tuned to the specific form of test being deployed. Recently a self-supervised speech representation based scheme, requiring no speech recognition, was proposed. This work extends the initial analysis conducted on this approach to a large scale proficiency test, Linguaskill, that comprises multiple parts, each designed to assess different attributes of a candidate's speaking proficiency. The performance of the self-supervised, wav2vec 2.0, system is compared to a high performance hand-crafted assessment system and a BERT-based text system both of which use speech transcriptions. Though the wav2vec 2.0 based system is found to be sensitive to the nature of the response, it can be configured to yield comparable performance to systems requiring a speech transcription, and yields gains when appropriately combined with standard approaches.