18 questions
0
votes
0
answers
33
views
pyAudioAnalysis speaker_diarization doesn't work well
I can play test.wav with ffplay and other media players.
Why doesn't speaker_diarization work well in my code, and how to solve this problem?
from unittest import result
import numpy as np
import ...
0
votes
0
answers
35
views
rttm files generated by diart.stream, are only speaker0 and speaker1
I use https://github.com/juanmc2005/diart for realtime speaker dizrization.
When I run :
diart.stream input.wav --tau-active=0.555 --rho-update=0.422 --delta-new=1.517 --hf-token --output rttm_folder ...
0
votes
0
answers
49
views
Google text to speech and speech to text speaker diarization with AI
I am designing a python math project for my students, where they are given a topic e.g. properties of a circle and they should be able to ask interact with AI to receive assistance on anything they ...
0
votes
0
answers
78
views
Celery tasks stuck when using --pool=prefork option
I am working on a project that involves processing audio files and generating transcripts using Celery tasks. The tasks are defined in the workers/process.py file. I have a generate_transcript task ...
0
votes
1
answer
1k
views
Pyannote: Load and Apply Speaker Diarization Offline
I try to use Pyannotes models offline.
I was loading and applying models like this:
from pyannote.audio import Pipeline
access_token = 'xxxxxxxxxxx'
model = Pipeline.from_pretrained(
"...
0
votes
1
answer
44
views
Speaker identification embeddings audio fragment length
I have a base of audio samples matched with concrete speaker like
nick_sample1.mp3
nick_sample2.mp3
...
nick_sampleN.mp3
john_sample1.mp3
john_sample2.mp3
...
john_sampleK.mp3
The task is to match a ...
0
votes
0
answers
60
views
Diarization with AWS streaming transcription (ReactJS)
I have a problem with setting up diarization for AWS streaming transcription in reactjs.
I use @aws-sdk/client-transcribe-streaming.
I set the ShowSpeakerLabel to true, but when I log data....
0
votes
0
answers
141
views
CUDA Out of Memory Error when Chunking Audio and Running Whisper Transcription in Python
I'm working on a Python script that transcribes audio files using a library that leverages CUDA for GPU acceleration. The transcription process involves chunking the audio file into smaller segments ...
2
votes
1
answer
190
views
Is there any way to transliterate hindi audio to english using OpenAI whisper
I have task where given an audio file I have to perform speaker diarization on the audio file and then I have to perform the transcription accordingly.
For speaker diarization I am using pyannote, ...
2
votes
1
answer
4k
views
RuntimeError: Library cublas64_12.dll is not found or cannot be loaded. While using WhisperX diarization
I was trying to use whisperx to do speaker diarization. I did it sucessfully on google colab but I'm encountering this error while tyring to transcribe the audio file.
Traceback (most recent call last)...
0
votes
0
answers
180
views
Microsoft Cognitive Services Speech SDK JavaScript and C# Quickstart samples both giving error while enrolling profile
We are using Microsoft Cognitive Services Speech SDK JavaScript and C# Quick start samples and I am getting error while enrollment of user profile.
"Activation Phrase is not matched"
Are any ...
0
votes
0
answers
105
views
How to use VitsModel with speaker embedding
I want to do TTS for German, and this code works perfectly:
from transformers import VitsModel, AutoTokenizer
import torch
model = VitsModel.from_pretrained("facebook/mms-tts-deu")
...
0
votes
0
answers
37
views
What pre-trained model can be used for speaker diarization in Kazakh language?
I have large dataset of call centers records in kazakh language. I want to build speaker diarization system. So what pre-trained model can be useful for fine-tuning and inferencing? My dataset ...
1
vote
0
answers
371
views
Whisper and pyannote 3.1 : AttributeError: 'list' object has no attribute 'get'
I'm using this script to diarize then transcribe speach using pyannote.audio and whisper. Using pyannote 2.1, it works perfectly, but then, when I change the version used to the latest (3.1), I get ...
1
vote
1
answer
154
views
Azure Speech diarization failing to tag speakers properly until a long 7second statement is spoken
Azure speech private preview for diarization was earlier setting “unknown” speaker tag until it recognise a long 7 seconds statement from a speaker, with the api in public preview it started tagging ...
1
vote
0
answers
233
views
Why am I getting "index 0 is out of bounds for axis 0 with size 0 when using pyAudioAnalysis library?
This question is about Speaker diarization. I'm trying to make a script that separates a mp4 file into different segments depending on different speakers. (The input mp4 file contains the dialogue of ...
3
votes
1
answer
3k
views
Way to Offline Speaker Diarization with Hugging Face
I am looking for Offline / locally saved model for speaker diarization with Hugging face without Authentication.
I have gone through google and found no relevant links for the same.
Is there any link/...
1
vote
0
answers
174
views
How to add speaker labels in AWS Transcribe streaming websockets
I'm using the AWS Transcribe example from https://github.com/amazon-archives/amazon-transcribe-websocket-static with a simple modification on the websocket query-string to add speaker labels.
The ...