286 questions
0
votes
0
answers
17
views
add a button for transcription to a Django form
I am building a website and have views with Django forms. I want to add a transcription button to each field in the Django form
class PostForm(StylishForm):
class Meta:
model = Post
...
-1
votes
0
answers
47
views
Python Speech-to-text using Whisper model to APK using Kivy , Buildozer
I'm working on speech-to-text using whisper model it runs in my computer but after conversion to APK file it don't. Application creation success and installed in android successfully but not opening. ...
0
votes
2
answers
891
views
KeyError: '__version__' installing openai-whisper on Python 3.13
Getting this error when I try to download Whisper
Python version 3.13 -
I wasted a lot of time figuring out the problem - can anyone help? I even tried reverting to a lower version but didn't help.
...
0
votes
0
answers
15
views
SSL: SSLV3_ALERT_BAD_RECORD_MAC sslv3 alert bad record mac (_ssl.c:2559) error while transcribing audio files from an API using whisper-turbo model
I am converting some audio calls coming from an Api using OpenAI whispers latest large-turbo model. During conversion at some point, it gives this error ([SSL: SSLV3_ALERT_BAD_RECORD_MAC] sslv3 alert ...
0
votes
1
answer
94
views
error: subprocess-exited-with-error for pip dependencies installation
I'm building a video generation automation project and I'm trying to use openai-whisper for the subtitle generation. Other dependencies are:
#File for dependencies
requests == 2.32.3
gTTS == 2.5.3
...
1
vote
0
answers
123
views
Trying to install openai-whisper via Poetry, but getting "No module named 'whisper'"
Hello all: I am trying to add openai-whisper via Poetry. I have tried many steps including adding the git to pyproject.toml dependencies.
[tool.poetry.dependencies]
python = "^3.12"
numpy = &...
0
votes
1
answer
142
views
How can I properly use whisper_timestamped in Python3
How can I use whisper-timestamped's transcribe method? I have audio file "example.mp3", I want to generate it's timestamped transcription, as I have done my research open Ai's whisper-...
0
votes
0
answers
28
views
Is there a way to capture the sound of specific browser tab?
I want to use Whisper to generate real-time subtitles while watching live streams, so I need the program to be able to capture the sound from the browser. However, I don't know how to do this.
I know ...
0
votes
0
answers
98
views
whisper finetuning on mutiple gpu's
Actually i am fine tuning whisper model but its running on only one gpu and other 3 gpu remain idle.
Problme is that it is not utilizing the other 3 GPU
I want to fine tuning the whisper model ...
0
votes
0
answers
58
views
Azure AI Machine Learning Studio Crashes from faster whisper
So I am trying to run faster whisper in Azure AI Machine Learning Studio.
The code is the sample code from https://github.com/SYSTRAN/faster-whisper and runs fine on my laptop (just CPU instead of GPU)...
0
votes
0
answers
30
views
TFWhisperForConditionalGeneration model.generate() returns repetitions of first word in sequence after finetuning
I fine-tuned TFWhisperForConditionalGeneration.from_pretrained("openai/whisper-tiny") on the German version of mozilla-foundation/common_voice_11_0. The training process looks fine (...
1
vote
1
answer
55
views
How to Process Data on GPU Instead of RAM for This Python Code?
I'm currently using the following code to process audio data, but it runs on the RAM. I want to offload the processing to the GPU to improve performance.
my code :
def prepare_dataset(batch):
...
1
vote
0
answers
121
views
Fine-Tuning Whisper for translate task "speech in cutom dialect to translated text in another custom language"
I recently fine-tuned the Whisper-Tiny model on my custom speech dataset for transcription tasks, and it worked well. However, when I tried to fine-tune the model for a translation task using the same ...
0
votes
0
answers
72
views
calculate flops in Whisper model
I am trying to calculate the flop count of a single pass through the whisper forward() function. I am using the thop library and getting 0 flop counts. I believe its because in its register_hooks, ...
1
vote
0
answers
316
views
What causes the Econreset error with OpenAI libraries?
I have an app using html and the OpenAI Whisper model. Everything works as intended, However I keep getting an error when it tries to fetch the text output from the OpenAI servers.
I started the web ...
0
votes
0
answers
83
views
How to get word-level AND sentence-level transcripts with OpenAI whisper?
I am using whisper and need to provide accurate results to my end users. The 2 options are:
Using segments granularities, I get sentences but no word timestamps:
{
"id": 0,
"seek"...
1
vote
0
answers
28
views
Finetune TFWhisperForConditionalGeneration
Hi everyone,
I'm currently searching for a way to fine-tune the Hugging Face TFWhisperForConditionalGeneration model (https://huggingface.co/docs/transformers/en/model_doc/whisper) to get a model in ....
0
votes
3
answers
206
views
OpenAI API return 400 Error: cannot transcribe ogg file
I want to create async func for transcribing telegram voices (.ogg files), but when i try send request to OpenAI API i recive error:
Error code: 400 - {'error': {'message': "Unrecognized file ...
0
votes
1
answer
199
views
Azure Openai Whipser REST endpoints
I wanted to use Whisper deployed through Azure OpenAI but I am having trouble finding the right resources for it.
I am trying to integrate a translator using Whisper in a flutter app that will take ...
0
votes
0
answers
121
views
Getting chunk level output with start and end timestamps with Whisper
I am using the Whisper3 model to transcribe several audio files. However, the output I am getting is in the form of a tensor. I would like to obtain text chunks with corresponding start and end ...
0
votes
1
answer
318
views
Openai whisper error with cmd and Google Colaboratory ("Traceback"...)
I tried to install whisper, but I am getting the error messages (below) with Command Line and even with Google Colaboratory (Google Drive). I tried mp4, mp3 and wav files, but I always get error ...
0
votes
0
answers
25
views
Whisper api transcribing in firefox but not in chrome
I want to transcribe my audio into text, so I am using the whisper v3-large model's api. I first send my audiofile to the server, where I use multer to allow for my request object to access the file ...
0
votes
1
answer
371
views
I would like to revert to the version of the whisperx model that I learned
When I run whisperx, it tells me that it is different from the version that once trained the model.
Also, I cannot install the old version.
This is my code
def use_whisperx(audio_file_name):
...
0
votes
0
answers
62
views
Issue with Data Preprocessing and Tensor Concatenation for Whisper Model Training
I am trying to train a Whisper model for Jeju dialect speech recognition. However, I am encountering several errors related to tensor concatenation during the data preprocessing phase. Below is the ...
0
votes
0
answers
320
views
Why faster_whisper model kills the kernel unexpectedly after running a few instances?
I am trying to convert few audio data into text data using OpenAI Whisper, though the larger-model accuracy is very good but it is very slow to process the audios. But then I found faster-whisper ...
0
votes
0
answers
114
views
Issue with OpenAI's transcription API
I'm having some issues with OpenAI's endpoint: https://api.openai.com/v1/audio/transcriptions
I have an audio file that I'm pulling from telegram's api. it's a .m4a file. I'm not getting errors when I'...
0
votes
0
answers
34
views
How do I access non-python files using directories in Python for the whisper library?
I'm trying to make an auto video app using the whisper module. I can't figure out how to access non-python files in python.
Code is here:
import whisper # type: ignore
model = whisper.load_model("...
0
votes
0
answers
119
views
Retrieving Token and Cost Information for OpenAI's Whisper and Text-to-Speech (TTS) Models in Langsmith?
Is it possible to retrieve token and cost information for OpenAI's Audio and TTS models in LangSmith runs while tracing?
OpenAI Audio Models
I am able to retrieve this information for chat completion ...
0
votes
0
answers
63
views
I am getting multiple errors while running a streaming model for audio. I am using insanely-fast-whisper along with the give model
Code is:
import torch
from transformers import pipeline
from transformers.utils import is_flash_attn_2_available
pipe = pipeline(
"automatic-speech-recognition",
model = "...
0
votes
1
answer
540
views
How can I use PowerShell to determine the active graphics card with the highest VRAM capacity?
Okay, I'm not one to ask for help, but I've been struggling for days with a feature I want to implement in a private OpenAI Whisper project. To put it in context, I'm doing a project to automate ...
0
votes
0
answers
111
views
I can't compile Whisper Ai into EXE file with Pyinstaller
i'm not sure if its a ffmpeg issue or a whisper ai issue
but after compiling my .py file with pyinstaller into one exe file using this command
pyinstaller --onefile --add-binary="ffmpeg.exe;.&...
0
votes
0
answers
26
views
Is it possible to add an audio stream to an already existing audio and video stream that is being recorded by MediaRecorder?
I am building an interview application where I am interrogated by an AI and I have to answer questions. The interview is recorded using MediaRecorder. The problem is that the audio that is generated ...
0
votes
0
answers
142
views
whisper model generate next token based on decoder hidden states
In the huggingface-whisper implementation, in super().generate() function, the initial decoder token ids are passed to predict next token. In shortform, I am unclear how the generate() is happening ...
0
votes
0
answers
95
views
Issue with Encoding Audio Buffer Data to Opus Size Being Incorrect in iOS Swift App
I'm trying to encode frames from an audio buffer to Opus data in a Swift iOS app, and send them to be encoded on a server by Whisper.
The error I keep getting back from the server is opuslib....
0
votes
1
answer
271
views
How to use Azure OpenAI SDK with Whisper for speech-to-text translation
Using:
<PackageReference Include="Azure.AI.OpenAI" Version="2.0.0-beta.2" />
I have this simple code:
var client = new OpenAIClient(
new ApiKeyCredential(apiKey),
...
0
votes
1
answer
47
views
How to recreate the "view" features of common voice v11 in HuggingFace?
The Common Voice v11 on HuggingFace has some amazing View features! They include a dropdown button to select the language, and columns with the dataset features, such as client_id, audio, sentence, ...
1
vote
0
answers
109
views
How to convert .wav file to a numpy array and then back to a .wav file format without losing quality/having the same audio without any noise?
I seem to be unable to convert a .wav file to a numpy array , and then back to .wav. The audio gets too noisy.
I tried using scipy.io.wavfile.read() ,and librosa.
The whole reason I am trying to do ...
0
votes
0
answers
238
views
OpenAI Whisper split audio / video File into chunks of 25MB in node js
I currently have an object of type File and can access its buffer which I want to split into chunks of 25MB due to whisper's limitations. However I tried with subarray but only the first chunk could ...
0
votes
0
answers
119
views
How to input MediaRecorder webm opus bytes to Whisper model?
I am recording voice, on the client side, using MediaRecorder, and sending the resulting blob of (webm, opus) bytes to the server using a WebSocket, with this code:
<script type="text/...
0
votes
1
answer
44
views
Anyway to get more description about wav files?
well i have a faster-whisper model and two files: when i put the first one into the model it returns me nonsense subtitles, BUT then i put the second file into model it work perfect
import wave
...
1
vote
3
answers
65
views
TypeError, can't handle two datatypes
I can't convert received HLS bytes into some datatype that openAI whisper can process.
Here I've got HLS via streamlink and I need some way to convert audio part of this HLS-bytes into correct numpy....
0
votes
0
answers
89
views
Whisper transcribes Ukrainian speech, but writes it in Latin characters. How to fix it?
I wrote the code for transcription of Ukrainian speech, the Whisper transcribes, but saves the file written in Latin characters. I also tried it without setting language to Ukrainian.
model = whisper....
0
votes
0
answers
141
views
CUDA Out of Memory Error when Chunking Audio and Running Whisper Transcription in Python
I'm working on a Python script that transcribes audio files using a library that leverages CUDA for GPU acceleration. The transcription process involves chunking the audio file into smaller segments ...
2
votes
1
answer
190
views
Is there any way to transliterate hindi audio to english using OpenAI whisper
I have task where given an audio file I have to perform speaker diarization on the audio file and then I have to perform the transcription accordingly.
For speaker diarization I am using pyannote, ...
0
votes
0
answers
77
views
Not sure how to resolve this issue with the python code
I'm using whisper to transcribe audio to text. I decided to use distil -whisper to speed it up. I've been trying to follow the instructions on Hugging Face but keep getting an error. I'm running this ...
1
vote
0
answers
654
views
Where is the Bottleneck for multiple requests using Whisper on Nvidia A100
I want to use Whisper-Large-v3 (Speech-to-Text) for a real-time application. However, I want to process several requests at the same time. My Whisper instance runs on an Nvidia A100 with 80GB VRAM.
In ...
0
votes
1
answer
377
views
OpenAI Whisper API: How do I shorten the response latency? How do I make Whisper respond quicker?
I am making a voice chatbot. The issue is that I needed to shorten the delay between finishing my sentence while speaking to the bot and the start of the bot's response, which currently takes about 6 ...
0
votes
0
answers
115
views
Whisper for 100 users concurrently using websockets
import asyncio
import websockets
import numpy as np
import whisper
import torch
import threading
import time
from urllib.parse import parse_qs
# Load Whisper model instances globally
model_pool = [...
1
vote
1
answer
555
views
How to process audio with Whisper in Rust
I have a tauri app that records a user's voice, sends a audio/webm base64 encoded string to a backend Rust function to processe it with open AI's Whisper model.
It mostly works, except for the fact ...
0
votes
0
answers
662
views
How to create a live voice activity detection (VAD) loop for whisperX?
I am using whisperX speech-to-text model to convert my voice into text input for a locally hosted LLM.
Right now, I have it set up where I can record an audio file, and then load it into whisperX. I ...