Offline Speech Recognition Made Easy With VOSK
Offline Speech Recognition Made Easy With VOSK
Offline Speech Recognition Made Easy With VOSK
Speech Recognition is one of the main branches of machine learning and can be used in various
of your everyday projects to enhance their accessibility. Of the few most popular uses of speech
recognition, Google’s assistant and Amazon’s Alexa top the list. But, there’s a catch with them.
These speech recognition APIs are only available over the internet and cannot be accessed
without an internet connection. This is all well and good since the most prominent use of speech
recognition is in household technologies and it will be hard to find a home without an internet
connection in this “online era” (a little extra true for 2020). But what if you wanted to use speech
recognition in an offline environment? Luckily, we have a ready-made solution for that and one
which doesn’t require much effort at your end!
Let’s have an intro
Vosk is an open-source and free Python toolkit used for offline speech recognition. It supports
speech recognition in 16 languages including English, Indian English, French, Spanish,
Portuguese, Chinese, Russian, Turkish, Vietnamese, Italian, Dutch, Arabic, Greek, with German,
Catalan, and Farsi being the recent additions.
Now, since there are a number of languages supported by VOSK speech recognition, we have
models for each language. These are small sized models (50 MB each) but you can also find
bigger server models as well. The models come with variety of features such as speaker
identification, streaming APIs, and reconfigurable vocabulary.
Another important feature of this toolkit is its availability for a variety of platforms including
desktop servers with Python, Java, C#, Node bindings as well as lightweight devices such as
Android, iOS, and Raspberry Pi. So, you can use it diversely for your video transcriptions as well
as home technologies and chatbots.
Before we begin, it is important to note that the accuracy of speech recognition tools cannot be
perfected yet and hence the results can be unexpected at times. But you can still rely on it to
provide a fairly good level of accuracy in speech recognition. If not, you can modify the models
to work better with your systems.
Enough about the library, time to walk the talk.
Installing VOSK
The easiest way to install VOSK is through using the pip command. But before that, it is
advisable to set up a virtual environment to work around the dependencies. It is not necessary but
a good practice when you are working with different versions of Python.
Virtual environment can be set up and activated in three steps:
Installing virtualenv package