Skip to main content
Filter by
Sorted by
Tagged with
0 votes
0 answers
33 views

pyAudioAnalysis speaker_diarization doesn't work well

I can play test.wav with ffplay and other media players. Why doesn't speaker_diarization work well in my code, and how to solve this problem? from unittest import result import numpy as np import ...
ju jinqian's user avatar
0 votes
0 answers
35 views

rttm files generated by diart.stream, are only speaker0 and speaker1

I use https://github.com/juanmc2005/diart for realtime speaker dizrization. When I run : diart.stream input.wav --tau-active=0.555 --rho-update=0.422 --delta-new=1.517 --hf-token --output rttm_folder ...
guresha's user avatar
0 votes
0 answers
49 views

Google text to speech and speech to text speaker diarization with AI

I am designing a python math project for my students, where they are given a topic e.g. properties of a circle and they should be able to ask interact with AI to receive assistance on anything they ...
J Mehta's user avatar
0 votes
0 answers
78 views

Celery tasks stuck when using --pool=prefork option

I am working on a project that involves processing audio files and generating transcripts using Celery tasks. The tasks are defined in the workers/process.py file. I have a generate_transcript task ...
Mehn 's user avatar
0 votes
1 answer
1k views

Pyannote: Load and Apply Speaker Diarization Offline

I try to use Pyannotes models offline. I was loading and applying models like this: from pyannote.audio import Pipeline access_token = 'xxxxxxxxxxx' model = Pipeline.from_pretrained( "...
Tütü's user avatar
  • 3
0 votes
1 answer
44 views

Speaker identification embeddings audio fragment length

I have a base of audio samples matched with concrete speaker like nick_sample1.mp3 nick_sample2.mp3 ... nick_sampleN.mp3 john_sample1.mp3 john_sample2.mp3 ... john_sampleK.mp3 The task is to match a ...
Anton Maiorov's user avatar
0 votes
0 answers
60 views

Diarization with AWS streaming transcription (ReactJS)

I have a problem with setting up diarization for AWS streaming transcription in reactjs. I use @aws-sdk/client-transcribe-streaming. I set the ShowSpeakerLabel to true, but when I log data....
Hoang Pham's user avatar
0 votes
0 answers
141 views

CUDA Out of Memory Error when Chunking Audio and Running Whisper Transcription in Python

I'm working on a Python script that transcribes audio files using a library that leverages CUDA for GPU acceleration. The transcription process involves chunking the audio file into smaller segments ...
Ruffy's user avatar
  • 1
2 votes
1 answer
190 views

Is there any way to transliterate hindi audio to english using OpenAI whisper

I have task where given an audio file I have to perform speaker diarization on the audio file and then I have to perform the transcription accordingly. For speaker diarization I am using pyannote, ...
Chaitanya Kale's user avatar
2 votes
1 answer
4k views

RuntimeError: Library cublas64_12.dll is not found or cannot be loaded. While using WhisperX diarization

I was trying to use whisperx to do speaker diarization. I did it sucessfully on google colab but I'm encountering this error while tyring to transcribe the audio file. Traceback (most recent call last)...
St.Destiny's user avatar
0 votes
0 answers
180 views

Microsoft Cognitive Services Speech SDK JavaScript and C# Quickstart samples both giving error while enrolling profile

We are using Microsoft Cognitive Services Speech SDK JavaScript and C# Quick start samples and I am getting error while enrollment of user profile. "Activation Phrase is not matched" Are any ...
Upendra's user avatar
  • 53
0 votes
0 answers
105 views

How to use VitsModel with speaker embedding

I want to do TTS for German, and this code works perfectly: from transformers import VitsModel, AutoTokenizer import torch model = VitsModel.from_pretrained("facebook/mms-tts-deu") ...
user1680859's user avatar
  • 1,194
0 votes
0 answers
37 views

What pre-trained model can be used for speaker diarization in Kazakh language?

I have large dataset of call centers records in kazakh language. I want to build speaker diarization system. So what pre-trained model can be useful for fine-tuning and inferencing? My dataset ...
Dias Balmash's user avatar
1 vote
0 answers
371 views

Whisper and pyannote 3.1 : AttributeError: 'list' object has no attribute 'get'

I'm using this script to diarize then transcribe speach using pyannote.audio and whisper. Using pyannote 2.1, it works perfectly, but then, when I change the version used to the latest (3.1), I get ...
boredgirl's user avatar
1 vote
1 answer
154 views

Azure Speech diarization failing to tag speakers properly until a long 7second statement is spoken

Azure speech private preview for diarization was earlier setting “unknown” speaker tag until it recognise a long 7 seconds statement from a speaker, with the api in public preview it started tagging ...
Goofy's user avatar
  • 49
1 vote
0 answers
233 views

Why am I getting "index 0 is out of bounds for axis 0 with size 0 when using pyAudioAnalysis library?

This question is about Speaker diarization. I'm trying to make a script that separates a mp4 file into different segments depending on different speakers. (The input mp4 file contains the dialogue of ...
RonaLightfoot's user avatar
3 votes
1 answer
3k views

Way to Offline Speaker Diarization with Hugging Face

I am looking for Offline / locally saved model for speaker diarization with Hugging face without Authentication. I have gone through google and found no relevant links for the same. Is there any link/...
san1's user avatar
  • 515
1 vote
0 answers
174 views

How to add speaker labels in AWS Transcribe streaming websockets

I'm using the AWS Transcribe example from https://github.com/amazon-archives/amazon-transcribe-websocket-static with a simple modification on the websocket query-string to add speaker labels. The ...
trenta3's user avatar
  • 143