Skip to main content

João Paulo Teixeira

Followers

3

Following

2

Co-author

1

Mentions

1

Public Views

Interests

Uploads

Papers by João Paulo Teixeira

Correlation between phonetic factors and linguistic events regarding a …

Proceedings of …

Braga, D.; Freitas, D.; Teixeira, JP; Barros, MJ; Latsh, V. 2001. “Correlation between Phonetic f... more

Segmental durations predicted with a neural network

8th European Conference on Speech Communication and Technology (Eurospeech 2003)

Acoustical characterisation of the accented syllable in portuguese, a contribution to the naturalness of speech synthesis

6th European Conference on Speech Communication and Technology (Eurospeech 1999)

Prediction of fujisaki model's phrase commands

8th European Conference on Speech Communication and Technology (Eurospeech 2003)

This paper presents a model to predict the phrase commands of the Fujisaki Model for F0 contour f... more This paper presents a model to predict the phrase commands of the Fujisaki Model for F0 contour for the Portuguese Language. Phrase commands location in text is governed by a set of weighted rules. The amplitude (Ap) and timing (T0) of the phrase commands are predicted in separate neural networks. The features for both neural networks are discussed. Finally a comparison between target and predicted values is presented.

Evaluation of a system for F0 contour prediction for european Portuguese

This paper presents the evaluation of a system for speech F 0 contour prediction for European Por... more This paper presents the evaluation of a system for speech F 0 contour prediction for European Portuguese using the Fujisaki model. It is composed of two command-generating subsystems, the phrase command subsystem and the accent command subsystem. The parameters for evaluating the ability of each subsystem are described. A comparison is made between original and predicted F 0 contours. Finally, the results of a perceptual test are discussed.

Reading Numbers System for Portuguese Language

International Journal of Reliable and Quality E-Healthcare, 2015

The paper presents an algorithm for read common numbers until one million in Portuguese language.... more The paper presents an algorithm for read common numbers until one million in Portuguese language. The record and cutting process of the digit speech sounds deserved a special attention to improve the speech sound output. A special attention is required for the correct inclusion of the particle ‘e' (and) to provide better naturalness of the read numbers. The system has the ability of simulate the human biologic speech sound production in the task of reading numbers. The system is based in the concatenation of carefully recorded edited and selected speech segments corresponding to the digits. The naturalness of the system was improved with the use of speech files of read digits in different positions (beginning, middle and end) and using digits concatenated with the particle ‘e' before, after and before and after the digit.

Pause Duration of Disfluent Speech

International Journal of Reliable and Quality E-Healthcare, 2013

This work has the goal of comparing the pause duration in the disfluent speech and normal speech.... more This work has the goal of comparing the pause duration in the disfluent speech and normal speech. Disfluency and normal spontaneous speech was recorded in a context were the subjects had to describe a scene from each other. The pause determination algorithm is presented. The automatic pause determinations allowed the measure of percentage of silence along the record of several minutes of speech. The stuttering scale is used to compare the severity of the subject. As expected, the percentage of speech pauses parameters is rather different in subjects with and without disfluent speech, but it does not seem that it is proportional to the severity of the disfluency.

Accuracy of Jitter and Shimmer Measurements

Procedia Technology, 2014

New approach of the ann methodology for forecasting time series: use of time index

In previous publications, the authors reported their work with the artificial neural networks (AN... more In previous publications, the authors reported their work with the artificial neural networks (ANN) methodologies for the forecast of guest nights in hotels time series. The ANN methodology has made predictions more accurate than other methodologies [1, 5]. However, as a consequence of the tourism demand increase in the last years these time series registered an unusual increase in its values. Considering that the ANN methodology uses the past to predict the future in a statistically way, it became very difficult for the ANN to predict numbers never seen before in the past. The authors report in this paper a new approach of the ANN methodology using the time in its input instead of the previous 12 registered observations, as usually used. The authors intend to capture the time variation of the series along the years, and use this parameter as the input. The paper presents a comparison between the classic usage of the ANN methodology with a new modulation using the years and month in the input. The new modulation consists in four variations of the input of the ANN: A-just month; B-year and month; C-a combination of A and classic model and D-a combination of B and classic model. The models B and D improved the forecasting performance over the classic model, with a mean relative error of 5.98% and 5.79% in the test set, against the 6.36% for the classic model.

Behaviour of Back Closed Non-Tonic Vowel [u] in European Portuguese: between a reduction and a neutralization

danielabraga.com

Braga, D.; Freitas, D.; Teixeira, JP; Barros, MJ; Latsh, V. 2001. Back Close Non-Syllabic Vowel ... more

Divisão silábica automática do texto escrito e falado

Encontro para o …, 2000

This article presents an algorithm that allows to carry through the syllabic splitting automatica... more This article presents an algorithm that allows to carry through the syllabic splitting automatically as a stage of the development of a more extensive work, that is the study of prosodic models for the European Portuguese, fit in the development of a synthesizer of speech. The algorithm of syllabic splitting is conceived for application in two distinct situations: in the first one it is applied to the text written and in second to the sequence of phonemes really produced in the locution of this exactly text. Each one of the applications has its peculiarities and difficulties, that are described, as well as the solutions adopted for its resolution. In the first case we obtains a tax of error of 0.06% and in the second case a tax of error of 0.89%. The algorithm is based on the consideration of syllables of types V, VC, VCC, CV, CVC, CCV and CCVC, being V a vowel or diphthong and C a consonant. We admit that this types of syllables covers all the existing syllables in Portuguese. Resumo Este artigo apresenta um algoritmo que permite realizar automaticamente a separação silábica como uma etapa do desenvolvimento de um trabalho mais extenso, que é o estudo de modelos prosódicos para o português europeu, enquadrado no desenvolvimento de um sintetizador de fala. O algoritmo de separação silábica foi concebido para aplicação em duas situações distintas: na primeira é aplicado ao texto escrito e na segunda à sequência de fonemas realmente produzidos na locução desse mesmo texto. Cada uma das aplicações está envolta nas suas peculiaridades e dificuldades, que são descritas, bem como as soluções adoptadas para a sua resolução. No primeiro caso consegue-se uma taxa de erro de 0,06% e no segundo caso uma taxa de erro de 0,89%. O algoritmo baseia-se na consideração de sílabas dos tipos V, VC, VCC, CV, CVC, CCV e CCVC, sendo V uma vogal ou ditongo e C uma consoante, que se admite cobrirem todas as sílabas existentes em Português.

Modelização paramétrica de sinais para aplicação em sistemas de …

ProGmatica: a prosodic and pragmatic database for european portuguese

In this work, a spontaneous speech corpus of broadcasted television material in European Portugue... more In this work, a spontaneous speech corpus of broadcasted television material in European Portuguese (EP) is presented. We decided to name it ProGmatica as it is meant to combine prosody information under a pragmatic framework. Our purpose is to analyse, describe and predict the prosodic patterns that are involved in speech acts and discourse events. It is also our goal to relate both prosody and pragmatics to emotion, style and attitude. In future developments, we intend, by this way, to provide EP TTS systems with pragmatic and emotional dimensions. From the whole recorded material we selected, extracted and saved prototypical speech acts with the help of speech analysis tools. We have a multi-speaker corpus, where linguistic, paralinguistic and extra linguistic information are labelled and related to each other. The paper is organized as follows. In section one, a brief state-of-the-art for the available EP corpora containing prosodic information is presented. In section two, we explain the pragmatic criteria used to structure this database. Then, we describe how the speech signal was labelled and which information layers were considered. In section three, we propose a prosodic prediction model to be applied to each speech act in future. In section four, some of the main problems we went through are discussed and future work is presented.

End-To-End Speech Synthesis Applied to Brazilian Portuguese

ArXiv, 2020

Voice synthesis systems are popular in different applications, such as personal assistants, GPS a... more Voice synthesis systems are popular in different applications, such as personal assistants, GPS applications, screen readers and accessibility tools.Voice provides a natural way for human-computer interaction. However, not all languages are on the same level when in terms of resources and systems for voice synthesis. This work consists of the creation of publicly available resources for Brazilian Portuguese in the form of a dataset and deep learning models for end-to-end voice synthesis. The dataset has 10.5 hours from a single speaker. We investigated three different architectures to perform end-to-end speech synthesis: Tacotron 1, DCTTS and Mozilla TTS. We also analysed the performance of models according to different vocoders (RTISI-LA, WaveRNN and Universal WaveRNN), phonetic transcriptions usage, transfer learning (from English) and denoising. In the proposed scenario, a model based on Mozilla TTS and RTISI-LA vocoder presented the best performance, achieving a 4.03 MOS value. ...

Vocal Acoustic Analysis

International Journal of E-Health and Medical Communications, 2020

Vocal acoustic analysis is becoming a useful tool for the classification and recognition of laryn... more Vocal acoustic analysis is becoming a useful tool for the classification and recognition of laryngological pathologies. This technique enables a non-invasive and low-cost assessment of voice disorders, allowing a more efficient, fast, and objective diagnosis. In this work, ANN and SVM were experimented on to classify between dysphonic/control and vocal cord paralysis/control. A vector was made up of 4 jitter parameters, 4 shimmer parameters, and a harmonic to noise ratio (HNR), determined from 3 different vowels at 3 different tones, with a total of 81 features. Variable selection and dimension reduction techniques such as hierarchical clustering, multilinear regression analysis and principal component analysis (PCA) was applied. The classification between dysphonic and control was made with an accuracy of 100% for female and male groups with ANN and SVM. For the classification between vocal cords paralysis and control an accuracy of 78,9% was achieved for female group with SVM, and...

Long Short Term Memory on Chronic Laryngitis Classification

Procedia Computer Science, 2018

The classification study with the use of machine learning concepts has been applied for years, an... more The classification study with the use of machine learning concepts has been applied for years, and one of the aspects in which this can be applied is for the analysis of speech acoustics applied to the analysis of pathologies. Among the pathologies present, one of them is chronic laryngitis. Thus, this article aims to present the results for a classification of chronic laryngitis with the use of Long Short Term Memory as a classifier. The parameters of relative jitter, relative shimmer and autocorrelation was used as input of the LSTM. A dataset of about 1500 instances were used to train, validate and test along 4 experiments with LSTM and one feedforward Artificial Neural Network (ANN). The results of the LSTM overcome the ones of the feedforward ANN, and was about 100% accuracy, sensitivity and specificity in test set, denoting a promising future for this classification tool in the voice pathologies diagnose.

Electroencephalogram Cepstral Distances in Alzheimer's Disease Diagnosis

Procedia Computer Science, 2015

On the Use of Prosodic Labelling in Corpus-Based Linguistic Studies of Spontaneous Speech

Lecture Notes in Computer Science, 2003

This paper addresses the construction of a spontaneous speech corpus in European Portuguese (here... more This paper addresses the construction of a spontaneous speech corpus in European Portuguese (hereafter EP), the corpus is presented and a prosodic labeling scheme that is here proposed is explained. The objective of this work is to provide a tool for linguistic analysis suitable to several research topics, which have speech and dialogue as objects. The main features considered in the database will be described and justified. Methodological problems and some observed prosodic and pragmatic related phenomena deriving from the labeling of the speech signal are also presented. A discussion is done about some applications on pragmatic studies, speech synthesis and prosodic phonology. Our purpose is to make this work available to scientific community, since there isn't any other database of this kind available and informatically accessible for EP. Future perspectives of the ongoing work are also previewed.

Evaluation of a Segmental Durations Model for TTS

Lecture Notes in Computer Science, 2003

In this paper we present a condensed description of a European Portuguese segmental duration's mo... more In this paper we present a condensed description of a European Portuguese segmental duration's model for TTS purposes and concentrate on its evaluation. This model is based on artificial neural networks. The evaluation of the model quality was made by comparison with read speech. The standard deviation reached in test set is 19.5 ms and the linear correlation coefficient is 0.84. The model is perceptually evaluated with 4.12 against 4.30 for natural human read speech in a scale of 5.

Evaluation of a Neural Network Segmental Durations Model for Portuguese

This paper presents a description of, as far as the authors know, the first published segmental d... more This paper presents a description of, as far as the authors know, the first published segmental durations model for European Portuguese for TTS purpose and its evaluation. This model is based in artificial neural networks trained with resilient back propagation algorithm. Using a substantial amount of training data, and a selected set of input factors, the standard deviation reaches in several paragraphs 19 ms and linear correlation superior to 0.9. This paper will present the methodology, the topology of the neural network, the training algorithm, discuss the importance of the used factors, the evaluation of the model and comparison with other models.

Correlation between phonetic factors and linguistic events regarding a …

Proceedings of …

Braga, D.; Freitas, D.; Teixeira, JP; Barros, MJ; Latsh, V. 2001. “Correlation between Phonetic f... more

Segmental durations predicted with a neural network

8th European Conference on Speech Communication and Technology (Eurospeech 2003)

Acoustical characterisation of the accented syllable in portuguese, a contribution to the naturalness of speech synthesis

6th European Conference on Speech Communication and Technology (Eurospeech 1999)

Prediction of fujisaki model's phrase commands

8th European Conference on Speech Communication and Technology (Eurospeech 2003)

This paper presents a model to predict the phrase commands of the Fujisaki Model for F0 contour f... more This paper presents a model to predict the phrase commands of the Fujisaki Model for F0 contour for the Portuguese Language. Phrase commands location in text is governed by a set of weighted rules. The amplitude (Ap) and timing (T0) of the phrase commands are predicted in separate neural networks. The features for both neural networks are discussed. Finally a comparison between target and predicted values is presented.

Evaluation of a system for F0 contour prediction for european Portuguese

This paper presents the evaluation of a system for speech F 0 contour prediction for European Por... more This paper presents the evaluation of a system for speech F 0 contour prediction for European Portuguese using the Fujisaki model. It is composed of two command-generating subsystems, the phrase command subsystem and the accent command subsystem. The parameters for evaluating the ability of each subsystem are described. A comparison is made between original and predicted F 0 contours. Finally, the results of a perceptual test are discussed.

Reading Numbers System for Portuguese Language

International Journal of Reliable and Quality E-Healthcare, 2015

The paper presents an algorithm for read common numbers until one million in Portuguese language.... more The paper presents an algorithm for read common numbers until one million in Portuguese language. The record and cutting process of the digit speech sounds deserved a special attention to improve the speech sound output. A special attention is required for the correct inclusion of the particle ‘e' (and) to provide better naturalness of the read numbers. The system has the ability of simulate the human biologic speech sound production in the task of reading numbers. The system is based in the concatenation of carefully recorded edited and selected speech segments corresponding to the digits. The naturalness of the system was improved with the use of speech files of read digits in different positions (beginning, middle and end) and using digits concatenated with the particle ‘e' before, after and before and after the digit.

Pause Duration of Disfluent Speech

International Journal of Reliable and Quality E-Healthcare, 2013

This work has the goal of comparing the pause duration in the disfluent speech and normal speech.... more This work has the goal of comparing the pause duration in the disfluent speech and normal speech. Disfluency and normal spontaneous speech was recorded in a context were the subjects had to describe a scene from each other. The pause determination algorithm is presented. The automatic pause determinations allowed the measure of percentage of silence along the record of several minutes of speech. The stuttering scale is used to compare the severity of the subject. As expected, the percentage of speech pauses parameters is rather different in subjects with and without disfluent speech, but it does not seem that it is proportional to the severity of the disfluency.

Accuracy of Jitter and Shimmer Measurements

Procedia Technology, 2014

New approach of the ann methodology for forecasting time series: use of time index

In previous publications, the authors reported their work with the artificial neural networks (AN... more In previous publications, the authors reported their work with the artificial neural networks (ANN) methodologies for the forecast of guest nights in hotels time series. The ANN methodology has made predictions more accurate than other methodologies [1, 5]. However, as a consequence of the tourism demand increase in the last years these time series registered an unusual increase in its values. Considering that the ANN methodology uses the past to predict the future in a statistically way, it became very difficult for the ANN to predict numbers never seen before in the past. The authors report in this paper a new approach of the ANN methodology using the time in its input instead of the previous 12 registered observations, as usually used. The authors intend to capture the time variation of the series along the years, and use this parameter as the input. The paper presents a comparison between the classic usage of the ANN methodology with a new modulation using the years and month in the input. The new modulation consists in four variations of the input of the ANN: A-just month; B-year and month; C-a combination of A and classic model and D-a combination of B and classic model. The models B and D improved the forecasting performance over the classic model, with a mean relative error of 5.98% and 5.79% in the test set, against the 6.36% for the classic model.

Behaviour of Back Closed Non-Tonic Vowel [u] in European Portuguese: between a reduction and a neutralization

danielabraga.com

Braga, D.; Freitas, D.; Teixeira, JP; Barros, MJ; Latsh, V. 2001. Back Close Non-Syllabic Vowel ... more

Divisão silábica automática do texto escrito e falado

Encontro para o …, 2000

This article presents an algorithm that allows to carry through the syllabic splitting automatica... more This article presents an algorithm that allows to carry through the syllabic splitting automatically as a stage of the development of a more extensive work, that is the study of prosodic models for the European Portuguese, fit in the development of a synthesizer of speech. The algorithm of syllabic splitting is conceived for application in two distinct situations: in the first one it is applied to the text written and in second to the sequence of phonemes really produced in the locution of this exactly text. Each one of the applications has its peculiarities and difficulties, that are described, as well as the solutions adopted for its resolution. In the first case we obtains a tax of error of 0.06% and in the second case a tax of error of 0.89%. The algorithm is based on the consideration of syllables of types V, VC, VCC, CV, CVC, CCV and CCVC, being V a vowel or diphthong and C a consonant. We admit that this types of syllables covers all the existing syllables in Portuguese. Resumo Este artigo apresenta um algoritmo que permite realizar automaticamente a separação silábica como uma etapa do desenvolvimento de um trabalho mais extenso, que é o estudo de modelos prosódicos para o português europeu, enquadrado no desenvolvimento de um sintetizador de fala. O algoritmo de separação silábica foi concebido para aplicação em duas situações distintas: na primeira é aplicado ao texto escrito e na segunda à sequência de fonemas realmente produzidos na locução desse mesmo texto. Cada uma das aplicações está envolta nas suas peculiaridades e dificuldades, que são descritas, bem como as soluções adoptadas para a sua resolução. No primeiro caso consegue-se uma taxa de erro de 0,06% e no segundo caso uma taxa de erro de 0,89%. O algoritmo baseia-se na consideração de sílabas dos tipos V, VC, VCC, CV, CVC, CCV e CCVC, sendo V uma vogal ou ditongo e C uma consoante, que se admite cobrirem todas as sílabas existentes em Português.

Modelização paramétrica de sinais para aplicação em sistemas de …

ProGmatica: a prosodic and pragmatic database for european portuguese

In this work, a spontaneous speech corpus of broadcasted television material in European Portugue... more In this work, a spontaneous speech corpus of broadcasted television material in European Portuguese (EP) is presented. We decided to name it ProGmatica as it is meant to combine prosody information under a pragmatic framework. Our purpose is to analyse, describe and predict the prosodic patterns that are involved in speech acts and discourse events. It is also our goal to relate both prosody and pragmatics to emotion, style and attitude. In future developments, we intend, by this way, to provide EP TTS systems with pragmatic and emotional dimensions. From the whole recorded material we selected, extracted and saved prototypical speech acts with the help of speech analysis tools. We have a multi-speaker corpus, where linguistic, paralinguistic and extra linguistic information are labelled and related to each other. The paper is organized as follows. In section one, a brief state-of-the-art for the available EP corpora containing prosodic information is presented. In section two, we explain the pragmatic criteria used to structure this database. Then, we describe how the speech signal was labelled and which information layers were considered. In section three, we propose a prosodic prediction model to be applied to each speech act in future. In section four, some of the main problems we went through are discussed and future work is presented.

End-To-End Speech Synthesis Applied to Brazilian Portuguese

ArXiv, 2020

Voice synthesis systems are popular in different applications, such as personal assistants, GPS a... more Voice synthesis systems are popular in different applications, such as personal assistants, GPS applications, screen readers and accessibility tools.Voice provides a natural way for human-computer interaction. However, not all languages are on the same level when in terms of resources and systems for voice synthesis. This work consists of the creation of publicly available resources for Brazilian Portuguese in the form of a dataset and deep learning models for end-to-end voice synthesis. The dataset has 10.5 hours from a single speaker. We investigated three different architectures to perform end-to-end speech synthesis: Tacotron 1, DCTTS and Mozilla TTS. We also analysed the performance of models according to different vocoders (RTISI-LA, WaveRNN and Universal WaveRNN), phonetic transcriptions usage, transfer learning (from English) and denoising. In the proposed scenario, a model based on Mozilla TTS and RTISI-LA vocoder presented the best performance, achieving a 4.03 MOS value. ...

Vocal Acoustic Analysis

International Journal of E-Health and Medical Communications, 2020

Vocal acoustic analysis is becoming a useful tool for the classification and recognition of laryn... more Vocal acoustic analysis is becoming a useful tool for the classification and recognition of laryngological pathologies. This technique enables a non-invasive and low-cost assessment of voice disorders, allowing a more efficient, fast, and objective diagnosis. In this work, ANN and SVM were experimented on to classify between dysphonic/control and vocal cord paralysis/control. A vector was made up of 4 jitter parameters, 4 shimmer parameters, and a harmonic to noise ratio (HNR), determined from 3 different vowels at 3 different tones, with a total of 81 features. Variable selection and dimension reduction techniques such as hierarchical clustering, multilinear regression analysis and principal component analysis (PCA) was applied. The classification between dysphonic and control was made with an accuracy of 100% for female and male groups with ANN and SVM. For the classification between vocal cords paralysis and control an accuracy of 78,9% was achieved for female group with SVM, and...

Long Short Term Memory on Chronic Laryngitis Classification

Procedia Computer Science, 2018

The classification study with the use of machine learning concepts has been applied for years, an... more The classification study with the use of machine learning concepts has been applied for years, and one of the aspects in which this can be applied is for the analysis of speech acoustics applied to the analysis of pathologies. Among the pathologies present, one of them is chronic laryngitis. Thus, this article aims to present the results for a classification of chronic laryngitis with the use of Long Short Term Memory as a classifier. The parameters of relative jitter, relative shimmer and autocorrelation was used as input of the LSTM. A dataset of about 1500 instances were used to train, validate and test along 4 experiments with LSTM and one feedforward Artificial Neural Network (ANN). The results of the LSTM overcome the ones of the feedforward ANN, and was about 100% accuracy, sensitivity and specificity in test set, denoting a promising future for this classification tool in the voice pathologies diagnose.

Electroencephalogram Cepstral Distances in Alzheimer's Disease Diagnosis

Procedia Computer Science, 2015

On the Use of Prosodic Labelling in Corpus-Based Linguistic Studies of Spontaneous Speech

Lecture Notes in Computer Science, 2003

This paper addresses the construction of a spontaneous speech corpus in European Portuguese (here... more This paper addresses the construction of a spontaneous speech corpus in European Portuguese (hereafter EP), the corpus is presented and a prosodic labeling scheme that is here proposed is explained. The objective of this work is to provide a tool for linguistic analysis suitable to several research topics, which have speech and dialogue as objects. The main features considered in the database will be described and justified. Methodological problems and some observed prosodic and pragmatic related phenomena deriving from the labeling of the speech signal are also presented. A discussion is done about some applications on pragmatic studies, speech synthesis and prosodic phonology. Our purpose is to make this work available to scientific community, since there isn't any other database of this kind available and informatically accessible for EP. Future perspectives of the ongoing work are also previewed.

Evaluation of a Segmental Durations Model for TTS

Lecture Notes in Computer Science, 2003

In this paper we present a condensed description of a European Portuguese segmental duration's mo... more In this paper we present a condensed description of a European Portuguese segmental duration's model for TTS purposes and concentrate on its evaluation. This model is based on artificial neural networks. The evaluation of the model quality was made by comparison with read speech. The standard deviation reached in test set is 19.5 ms and the linear correlation coefficient is 0.84. The model is perceptually evaluated with 4.12 against 4.30 for natural human read speech in a scale of 5.

Evaluation of a Neural Network Segmental Durations Model for Portuguese

This paper presents a description of, as far as the authors know, the first published segmental d... more This paper presents a description of, as far as the authors know, the first published segmental durations model for European Portuguese for TTS purpose and its evaluation. This model is based in artificial neural networks trained with resilient back propagation algorithm. Using a substantial amount of training data, and a selected set of input factors, the standard deviation reaches in several paragraphs 19 ms and linear correlation superior to 0.9. This paper will present the methodology, the topology of the neural network, the training algorithm, discuss the importance of the used factors, the evaluation of the model and comparison with other models.