Offline Speech Recognition Made Easy With VOSK

Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 3

Offline Speech Recognition made easy with VOSK

Speech Recognition is one of the main branches of machine learning and can be used in various
of your everyday projects to enhance their accessibility. Of the few most popular uses of speech
recognition, Google’s assistant and Amazon’s Alexa top the list. But, there’s a catch with them.
These speech recognition APIs are only available over the internet and cannot be accessed
without an internet connection. This is all well and good since the most prominent use of speech
recognition is in household technologies and it will be hard to find a home without an internet
connection in this “online era” (a little extra true for 2020). But what if you wanted to use speech
recognition in an offline environment? Luckily, we have a ready-made solution for that and one
which doesn’t require much effort at your end!
Let’s have an intro
Vosk is an open-source and free Python toolkit used for offline speech recognition. It supports
speech recognition in 16 languages including English, Indian English, French, Spanish,
Portuguese, Chinese, Russian, Turkish, Vietnamese, Italian, Dutch, Arabic, Greek, with German,
Catalan, and Farsi being the recent additions.
Now, since there are a number of languages supported by VOSK speech recognition, we have
models for each language. These are small sized models (50 MB each) but you can also find
bigger server models as well. The models come with variety of features such as speaker
identification, streaming APIs, and reconfigurable vocabulary.
Another important feature of this toolkit is its availability for a variety of platforms including
desktop servers with Python, Java, C#, Node bindings as well as lightweight devices such as
Android, iOS, and Raspberry Pi. So, you can use it diversely for your video transcriptions as well
as home technologies and chatbots.
Before we begin, it is important to note that the accuracy of speech recognition tools cannot be
perfected yet and hence the results can be unexpected at times. But you can still rely on it to
provide a fairly good level of accuracy in speech recognition. If not, you can modify the models
to work better with your systems.
Enough about the library, time to walk the talk.
Installing VOSK
The easiest way to install VOSK is through using the pip command. But before that, it is
advisable to set up a virtual environment to work around the dependencies. It is not necessary but
a good practice when you are working with different versions of Python.
Virtual environment can be set up and activated in three steps:
 Installing virtualenv package

Pip install virtualenv


 Creating the virtual environment
Virtualenv myenv
 Activating the virtual environment

myenv\Scripts\activate (For Windows)


myenv\bin\activate (For Linux)
Once, the virtual environment has been set up and activated, the next step is to install Python
inside the virtual environment which satisfies the following dependency requirements
 Python version: 3.5-3.8 (Linux), 3.6-3.7 (ARM), 3.8 (OSX), 3.8-64bit (Windows)
 pip version: 19.0 and newer.
After that, you can go on and install VOSK using the pip command:
Pip install vosk
The vosk api should be installed on your system now. If you got any error, make sure that the
Python version is same as the requirement.
Working with VOSK
Now that we are done with the installation process, it is time to see how you can put it to use!
With the virtual environment created and activated, and the vosk-api securely installed inside the
virtualenv, the next step is to clone the vosk Github repo (https://github.com/alphacep/vosk-api)
in your root folder. You can find how to clone a Github repository here
(https://docs.github.com/en/free-pro-team@latest/github/creating-cloning-and-archiving-
repositories/cloning-a-repository).
After this, you need a model to work with your library. You can install one of the models from
here (https://alphacephei.com/vosk/models) according to your choice of language or you can
train a model of your own. Download the model directly in your root folder and copy it in the .\
vosk-api\python\example folder.
Now, let’s get to work!
Speech recognition using microphone
Now, the speech recognition through microphone doesn’t work without the pyAudio module. So,
you have to install it using, again, the pip command.
pip install pyaudio
Now, here’s a secret. Using pip to install pyaudio does not work on Windows when you are
using Python 3.7 or greater and you can follow this
(https://stackoverflow.com/questions/52283840/i-cant-install-pyaudio-on-windows-how-to-
solve-error-microsoft-visual-c-14/52284344#52284344) to successfully install pyaudio on your
system.
Now, just one more step before you can start your microphone test. Go to the F:\env\Lib\site-
packages and find the pyaudio.py file. Modify it so that the exception_on_overflow parameter in
the read function is set to ‘False’.
Let’s run the microphone_test.py file. Navigate to the .\vosk-api\python\example folder through
your terminal and execute the “test_microphone.py” file.
python test_microphone.py
As you will speak into your microphone, you will see the speech recognizer working its magic
with the transcribed words appearing on your terminal window.
Speech recognition using a .mp4 file
If you want to use vosk for transcribing a video, you can do that by following this section. All
you need is a sample video which you will use for speech recognition and the FFmpeg package
which is used for processing multimedia files through command-line interface.
You can easily find any sample .mp4 video file on the internet or you can record one of you own.
And the FFmpeg package can be downloaded through this
(https://ffmpeg.org/download.html#build-windows).
Once both of the requirements are met, you can put your video in the .\vosk-api\python\example
folder and look for the ffmpeg.exe file in speech_recognition\ffmpeg-2020-10-03-git-
069d2b4a50-full_build\bin folder, which you have to put in the same folder as your video.
Now you can start the speech recognition using the video file by executing the “test_ffmpeg.py”
file.
Python test_ffmpeg.py sample.mp4
The voice-to-speech translation of the video can be seen on the terminal window.

You might also like