This paper describes pitch estimation of Marathi spoken numbers which are extracted the features ... more This paper describes pitch estimation of Marathi spoken numbers which are extracted the features from various speech signals. The speech frequencies of Marathi spoken numbers are acquired by various male and female speakers. The pitch frequencies are normalized using PRAAT tool. The pitch contours are compared with pitch detector. The autocorrelation and cepstral methods are used to estimate speech frequency. Pitch detection is calculated by statistical methods and similarity is measured by Euclidian distance. The pitch frequency results found to be satisfactory. The average mean of frequency varies from 1.48 to 2.03 and standard deviation varies from 0.84 to 1.38 in Hz.
International journal of computer applications, Oct 20, 2012
This paper deals with pitch estimation of spoken Devnagari vowels from the original speech signal... more This paper deals with pitch estimation of spoken Devnagari vowels from the original speech signals. Devnagari vowels are playing the vital role in pronunciation of any word. Each vowel is classified as starting, middle and end according to the duration of occurrences in the word. The Devnagari script having 12-vowels and 34-consonants are used in some Indian language like Marathi. The Devnagari vowels are categorised into 5-types such as short vowels, long vowels, conjunct vowels, nasal vowel and visarg vowel. The Pitch frequency is estimated from the features of speech signals via pitch detection algorithm through autocorrelation and cepstral methods. These vowels are recorded through PRAAT tools with noisy environment. The pitch estimation of original pitch frequency hasbeen calculated in statistical manner (Mean and standard deviation). The implementation, experiments and result discussions are also existence.These results which have been appropriate match with both techniques.
Abstract -Speak more than 15001 different languages in India. Vocal and linguistic technology wou... more Abstract -Speak more than 15001 different languages in India. Vocal and linguistic technology would be quite beneficial to most of them. Indian languages have unique features and similarities that must be used in order to provide voice recognition capability for these low-resource languages. A low-resource Auto-matic Speech Recognition system for Indian languages was developed with this objective in mind. Rapid progress has been achieved in speaker diarization across a variety of application domains. A historical assessment of speaker diarization technology, as well as contemporary advances in neural speaker diarization techniques, is presented in this study. By integrating recent advancements in neural techniques. We believe that this study significantly contributes to the community by providing valuable insights and paving the way for advancements in the field of speaker diarization, ultimately leading to more efficient and accurate results. Keywords:ASR, Speaker diarization, MFCC, LSTM, Deep learning.
Proceedings of the 3rd International Conference on ICT for Digital, Smart, and Sustainable Development, ICIDSSD 2022, 24-25 March 2022, New Delhi, India
Epilepsy is a type of neurological brain disorder due to a temporary change in the brain's electr... more Epilepsy is a type of neurological brain disorder due to a temporary change in the brain's electrical function. If diagnosed and treated, there can be no seizures. Electroencephalogram (EEG) is the most common technique used in diagnosing epilepsy to avoid danger and take preventive precautions. This paper applied deep learning and machine learning techniques for detecting epileptic seizures and identifying whether machine learning or deep learning classifiers are more pertinent for the purpose and then trying to improve the present techniques for seizure detection. The best performance of the deep learning models has been achieved by implementing the convolutional neural network (CNN) algorithm on the EEG signal dataset in which the result appears as follows: accuracy 99.2%, specificity 99.3% and sensitivity 98.7%. For hybrid deep neural network CNN with long short-term memory (LSTM), the accuracy reached is 98.7%.
International Journal for Research in Applied Science and Engineering Technology
The next major feature of the age of conversational services is chatbots in the new era of techno... more The next major feature of the age of conversational services is chatbots in the new era of technology. A Chatbot framework is a software program that uses natural language to communicate with users. Chatbots is a virtual entity that can effectively explore the use of digital textual competencies with any human being. Recently, their growth as a medium of conversation between people and computers has taken a great step forward. The aim of the chatbot framework for machine learning and artificial intelligence is to simulate a human conversation , maybe through text or speech. Natural Language Processing understands one or more human languages through chatbot software. To simulate informal chat communication, the chatbot structure combines a language model and computational algorithms covering enormous natural language processing techniques. This paper discusses other applications that may be useful for chatbots, such as a computer conversation system, virtual agent, dialogue system, retrieval of information , industry, telecommunications, banking, health, customer call centers, and e-commerce. It also offers an overview of cloud-based chatbots technology along with the programming of chatbots and programming problems in the chatbot's present and future periods.
2021 International Conference on Computational Intelligence and Computing Applications (ICCICA), 2021
Speech Recognition and communication between humans and computers have made tremendous progress o... more Speech Recognition and communication between humans and computers have made tremendous progress over the last three decades. Speech recognition technologies allow the machine to respond correctly to a human voice. Nowadays a lot of Automatic Speech Recognition Systems are developed which are more resistant to environmental, speaker, and language variability. The voice-based application provides valuable and useful services to the user. Deep learning is an emerging area, in the last few years research has focused on using it for speech-related different applications. Feature extraction, speech classifiers, speech representation, speech database, and performance are some important issues that should be considered while designing a speech recognition system. The challenges that exist in ASR, as well as the different methods developed by various researchers, have been described in sequence. This paper explores the significant advances in speech communication research over the years, also helps to identify a different tool along with its merits and demerits. The primary aim of the article is to conduct a comparison between various speech recognition methods. This paper shed light on the trends in speech recognition system and also bring focus to new research topics
This paper describes the Implementation of Natural Sounding Speech Synthesizer for Marathi Langua... more This paper describes the Implementation of Natural Sounding Speech Synthesizer for Marathi Language using English Script. The natural synthesizer is developed using unit selection on the basis of concatenative synthesis approach. The purpose of this synthesizer is to produce natural sound in Marathi language by computer. The natural Marathi words and sentences have been acquired by 'Marathi Wordnet' because all Marathi linguists are referred Wordnet. In this system, around 28,580 syllables, natural words and sentences were used. These natural syllables have been spoken by one female speaker. The voice signals were recorded through standard Sennheiser HD449 Wired Headphone using PRAAT tool with sampling frequency of 22 KHz. The ETMS-system was tested and generated the natural output as well as waveform. The formant frequencies (F1, F2 and F3) were also determined by MATLAB and PRAAT tools. The formant frequencies results are to be found satisfactory.
2022 4th International Conference on Smart Systems and Inventive Technology (ICSSIT), 2022
This research paper proposed a smart system based on deep learning to detect COVID-19 patient'... more This research paper proposed a smart system based on deep learning to detect COVID-19 patient's using the cough sound. The deep neural networks are used to distinguish between different types of cough COVID-19 positive or negative coughs. The proposed system is segmented into three stages: Audio pre-processing by noise reduction, segmentation, feature extraction, classification, and model deployment. Eight features have been extracted from 1635 sound subjects: 573 COVID-19 positive and 1062 negative coughs. The feature's extracted data have trained using two models; first model Cough detection based on ANN used to distinguish if there is cough or not, the second model to detect the covid-19 using Convolutional Neural Network. The overall accuracy for both models is 98.1% for the Cough model and 98.5% for the Covid-19 model. The models were compiled after deployment to work together as a web service based on flask. Cough model receives cough sound from the mobile app or web interface and discriminates if there is cough then passe it coivd1-9 model that will analyze if cough is positive or negative. and send the result back to the mobile app.
–This paper reports the development of the soil spectral signature using Spectroradiometer form V... more –This paper reports the development of the soil spectral signature using Spectroradiometer form Visible Near Infrared, Short Wave Infrared and Mid-Infrared spectral reflectance of soil. Soil properties such as amount of carbon, nitrogen, phosphorus, potassium, sand, silt, and clay contains have been determined by using hyperspectral band models, in the wavelength band of 350-2500nm. Different mathematical models such as, Principal component analysis (PCA) and partial least square regression (PLSR) have widely been used to extract information regarding soil properties. This review article analyzes the reference work from 2005 to 2015.
This paper describes the Implementation of Natural Sounding Speech Synthesizer for Marathi Langua... more This paper describes the Implementation of Natural Sounding Speech Synthesizer for Marathi Language using English Script. The natural synthesizer is developed using unit selection on the basis of concatenative synthesis approach. The purpose of this synthesizer is to produce natural sound in Marathi language by computer. The natural Marathi words and sentences have been acquired by 'Marathi Wordnet' because all Marathi linguists are referred Wordnet. In this system, around 28,580 syllables, natural words and sentences were used. These natural syllables have been spoken by one female speaker. The voice signals were recorded through standard Sennheiser HD449 Wired Headphone using PRAAT tool with sampling frequency of 22 KHz. The ETMS-system was tested and generated the natural output as well as waveform. The formant frequencies (F1, F2 and F3) were also determined by MATLAB and PRAAT tools. The formant frequencies results are to be found satisfactory.
This paper describes pitch estimation of Marathi spoken numbers which are extracted the features ... more This paper describes pitch estimation of Marathi spoken numbers which are extracted the features from various speech signals. The speech frequencies of Marathi spoken numbers are acquired by various male and female speakers. The pitch frequencies are normalized using PRAAT tool. The pitch contours are compared with pitch detector. The autocorrelation and cepstral methods are used to estimate speech frequency. Pitch detection is calculated by statistical methods and similarity is measured by Euclidian distance. The pitch frequency results found to be satisfactory. The average mean of frequency varies from 1.48 to 2.03 and standard deviation varies from 0.84 to 1.38 in Hz.
International journal of computer applications, Oct 20, 2012
This paper deals with pitch estimation of spoken Devnagari vowels from the original speech signal... more This paper deals with pitch estimation of spoken Devnagari vowels from the original speech signals. Devnagari vowels are playing the vital role in pronunciation of any word. Each vowel is classified as starting, middle and end according to the duration of occurrences in the word. The Devnagari script having 12-vowels and 34-consonants are used in some Indian language like Marathi. The Devnagari vowels are categorised into 5-types such as short vowels, long vowels, conjunct vowels, nasal vowel and visarg vowel. The Pitch frequency is estimated from the features of speech signals via pitch detection algorithm through autocorrelation and cepstral methods. These vowels are recorded through PRAAT tools with noisy environment. The pitch estimation of original pitch frequency hasbeen calculated in statistical manner (Mean and standard deviation). The implementation, experiments and result discussions are also existence.These results which have been appropriate match with both techniques.
Abstract -Speak more than 15001 different languages in India. Vocal and linguistic technology wou... more Abstract -Speak more than 15001 different languages in India. Vocal and linguistic technology would be quite beneficial to most of them. Indian languages have unique features and similarities that must be used in order to provide voice recognition capability for these low-resource languages. A low-resource Auto-matic Speech Recognition system for Indian languages was developed with this objective in mind. Rapid progress has been achieved in speaker diarization across a variety of application domains. A historical assessment of speaker diarization technology, as well as contemporary advances in neural speaker diarization techniques, is presented in this study. By integrating recent advancements in neural techniques. We believe that this study significantly contributes to the community by providing valuable insights and paving the way for advancements in the field of speaker diarization, ultimately leading to more efficient and accurate results. Keywords:ASR, Speaker diarization, MFCC, LSTM, Deep learning.
Proceedings of the 3rd International Conference on ICT for Digital, Smart, and Sustainable Development, ICIDSSD 2022, 24-25 March 2022, New Delhi, India
Epilepsy is a type of neurological brain disorder due to a temporary change in the brain's electr... more Epilepsy is a type of neurological brain disorder due to a temporary change in the brain's electrical function. If diagnosed and treated, there can be no seizures. Electroencephalogram (EEG) is the most common technique used in diagnosing epilepsy to avoid danger and take preventive precautions. This paper applied deep learning and machine learning techniques for detecting epileptic seizures and identifying whether machine learning or deep learning classifiers are more pertinent for the purpose and then trying to improve the present techniques for seizure detection. The best performance of the deep learning models has been achieved by implementing the convolutional neural network (CNN) algorithm on the EEG signal dataset in which the result appears as follows: accuracy 99.2%, specificity 99.3% and sensitivity 98.7%. For hybrid deep neural network CNN with long short-term memory (LSTM), the accuracy reached is 98.7%.
International Journal for Research in Applied Science and Engineering Technology
The next major feature of the age of conversational services is chatbots in the new era of techno... more The next major feature of the age of conversational services is chatbots in the new era of technology. A Chatbot framework is a software program that uses natural language to communicate with users. Chatbots is a virtual entity that can effectively explore the use of digital textual competencies with any human being. Recently, their growth as a medium of conversation between people and computers has taken a great step forward. The aim of the chatbot framework for machine learning and artificial intelligence is to simulate a human conversation , maybe through text or speech. Natural Language Processing understands one or more human languages through chatbot software. To simulate informal chat communication, the chatbot structure combines a language model and computational algorithms covering enormous natural language processing techniques. This paper discusses other applications that may be useful for chatbots, such as a computer conversation system, virtual agent, dialogue system, retrieval of information , industry, telecommunications, banking, health, customer call centers, and e-commerce. It also offers an overview of cloud-based chatbots technology along with the programming of chatbots and programming problems in the chatbot's present and future periods.
2021 International Conference on Computational Intelligence and Computing Applications (ICCICA), 2021
Speech Recognition and communication between humans and computers have made tremendous progress o... more Speech Recognition and communication between humans and computers have made tremendous progress over the last three decades. Speech recognition technologies allow the machine to respond correctly to a human voice. Nowadays a lot of Automatic Speech Recognition Systems are developed which are more resistant to environmental, speaker, and language variability. The voice-based application provides valuable and useful services to the user. Deep learning is an emerging area, in the last few years research has focused on using it for speech-related different applications. Feature extraction, speech classifiers, speech representation, speech database, and performance are some important issues that should be considered while designing a speech recognition system. The challenges that exist in ASR, as well as the different methods developed by various researchers, have been described in sequence. This paper explores the significant advances in speech communication research over the years, also helps to identify a different tool along with its merits and demerits. The primary aim of the article is to conduct a comparison between various speech recognition methods. This paper shed light on the trends in speech recognition system and also bring focus to new research topics
This paper describes the Implementation of Natural Sounding Speech Synthesizer for Marathi Langua... more This paper describes the Implementation of Natural Sounding Speech Synthesizer for Marathi Language using English Script. The natural synthesizer is developed using unit selection on the basis of concatenative synthesis approach. The purpose of this synthesizer is to produce natural sound in Marathi language by computer. The natural Marathi words and sentences have been acquired by 'Marathi Wordnet' because all Marathi linguists are referred Wordnet. In this system, around 28,580 syllables, natural words and sentences were used. These natural syllables have been spoken by one female speaker. The voice signals were recorded through standard Sennheiser HD449 Wired Headphone using PRAAT tool with sampling frequency of 22 KHz. The ETMS-system was tested and generated the natural output as well as waveform. The formant frequencies (F1, F2 and F3) were also determined by MATLAB and PRAAT tools. The formant frequencies results are to be found satisfactory.
2022 4th International Conference on Smart Systems and Inventive Technology (ICSSIT), 2022
This research paper proposed a smart system based on deep learning to detect COVID-19 patient'... more This research paper proposed a smart system based on deep learning to detect COVID-19 patient's using the cough sound. The deep neural networks are used to distinguish between different types of cough COVID-19 positive or negative coughs. The proposed system is segmented into three stages: Audio pre-processing by noise reduction, segmentation, feature extraction, classification, and model deployment. Eight features have been extracted from 1635 sound subjects: 573 COVID-19 positive and 1062 negative coughs. The feature's extracted data have trained using two models; first model Cough detection based on ANN used to distinguish if there is cough or not, the second model to detect the covid-19 using Convolutional Neural Network. The overall accuracy for both models is 98.1% for the Cough model and 98.5% for the Covid-19 model. The models were compiled after deployment to work together as a web service based on flask. Cough model receives cough sound from the mobile app or web interface and discriminates if there is cough then passe it coivd1-9 model that will analyze if cough is positive or negative. and send the result back to the mobile app.
–This paper reports the development of the soil spectral signature using Spectroradiometer form V... more –This paper reports the development of the soil spectral signature using Spectroradiometer form Visible Near Infrared, Short Wave Infrared and Mid-Infrared spectral reflectance of soil. Soil properties such as amount of carbon, nitrogen, phosphorus, potassium, sand, silt, and clay contains have been determined by using hyperspectral band models, in the wavelength band of 350-2500nm. Different mathematical models such as, Principal component analysis (PCA) and partial least square regression (PLSR) have widely been used to extract information regarding soil properties. This review article analyzes the reference work from 2005 to 2015.
This paper describes the Implementation of Natural Sounding Speech Synthesizer for Marathi Langua... more This paper describes the Implementation of Natural Sounding Speech Synthesizer for Marathi Language using English Script. The natural synthesizer is developed using unit selection on the basis of concatenative synthesis approach. The purpose of this synthesizer is to produce natural sound in Marathi language by computer. The natural Marathi words and sentences have been acquired by 'Marathi Wordnet' because all Marathi linguists are referred Wordnet. In this system, around 28,580 syllables, natural words and sentences were used. These natural syllables have been spoken by one female speaker. The voice signals were recorded through standard Sennheiser HD449 Wired Headphone using PRAAT tool with sampling frequency of 22 KHz. The ETMS-system was tested and generated the natural output as well as waveform. The formant frequencies (F1, F2 and F3) were also determined by MATLAB and PRAAT tools. The formant frequencies results are to be found satisfactory.
Uploads
Papers by Sunil Nimbhore
to most of them. Indian languages have unique features and similarities that must be used in order to provide voice
recognition capability for these low-resource languages. A low-resource Auto-matic Speech Recognition system for Indian
languages was developed with this objective in mind. Rapid progress has been achieved in speaker diarization across a
variety of application domains. A historical assessment of speaker diarization technology, as well as contemporary
advances in neural speaker diarization techniques, is presented in this study. By integrating recent advancements in
neural techniques. We believe that this study significantly contributes to the community by providing valuable insights
and paving the way for advancements in the field of speaker diarization, ultimately leading to more efficient and accurate
results.
Keywords:ASR, Speaker diarization, MFCC, LSTM, Deep learning.
to most of them. Indian languages have unique features and similarities that must be used in order to provide voice
recognition capability for these low-resource languages. A low-resource Auto-matic Speech Recognition system for Indian
languages was developed with this objective in mind. Rapid progress has been achieved in speaker diarization across a
variety of application domains. A historical assessment of speaker diarization technology, as well as contemporary
advances in neural speaker diarization techniques, is presented in this study. By integrating recent advancements in
neural techniques. We believe that this study significantly contributes to the community by providing valuable insights
and paving the way for advancements in the field of speaker diarization, ultimately leading to more efficient and accurate
results.
Keywords:ASR, Speaker diarization, MFCC, LSTM, Deep learning.