Voice Recognition System
Voice Recognition System
Voice Recognition System
ISSN: 2395-7549
Abstract
Voice recognition system is a system which is used to convert human voice into signal, which can be understood by the machines.
When this is achieved, the machine can be made to work, as desired. The machine could be a computer, a typewriter, or even a
robot. There are systems available, in which the machine speaks the recorded word. But that is out of the scope of this paper.
Here, only the human is expected to talk. Further, the voice recognition systems described here, can be used for projects only.
Keywords: Speech Recognition System, Acoustic Model, DTMF Decoder, HM 2007, Voice Recognition Moule VR3
_______________________________________________________________________________________________________
I. INTRODUCTION
The technical paper aims to explain various voice recognition systems, available. There are various software and hardware devices,
which use various techniques to decode human speech.
History
The concept of speech recognition started somewhere in 1940s.Practically the first speech recognition program appeared in 1952
at the bell labs[2],[3], that was about recognition of a digit in a noise free environment. Bell Laboratories designed in 1952 the
"Audrey" system, which recognized digits spoken by a single voice. This first speech recognition system, could understand only
digits. 1940s and 1950s is considered as the foundational period of the speech recognition technology. In this period, work was
done on the foundational paradigms of the speech recognition, which is, automation and information theoretic models.
Later, this device was improved to recognize spoken words, numbers etc. to obtain ASR(Automatic Speech Recognition)
system.
Acoustic Model
A model[1],[4] that is created by taking audio recordings of speech and their text transcriptions, and using a software to create
statistical representations of the sounds that make up each word.It is used by a speech recognition engine to recognize speech.The
software in this model, breaks the words into the phonemes.Phonemes are any of the perceptually distinct units of sound in a
specified language that distinguish one word from another. For example p, b, d,and t in the English words, pad, pat, bad, and bat.
Language Model
Language modeling[1] is used in many natural language processing applications such as speech recognition. It tries to capture the
properties of a language and to predict the next word in the speech sequence. The software of this model compares the phonemes
to words in its built in dictionary .But, as said earlier, this technical paper will not discuss this type of speech model.
IV. WORKING
The basic idea behind any speech recognition system is that, the speaker first records the text, desired to be recognized. The
recording is done through a microphone, connected to a mobile(in case a software is used), or the voice recognition device (if the
recognition hardware is used). This text is retrieved, when called for.
As said earlier, there are various speech recognition systems, few among which are discussed in this paper.
The software systems:
Visual Basic
This is a software[9] program, which is based on three labels: yes, no and may-be. These are initialized to large font and a light-
gray colour, as shown in figure 3.
A reference to the System. Speech component is made, and the code is added. When the application starts the Windows, speech
recognition system will be loaded. After saying "Start Listening" or by clicking on the microphone icon, recognition starts.
Upon saying "yes", "no" or "maybe", the appropriate label lights up. And if anything else is said, the labels turn back to grey.
MATLAB
In this software [6], a word-detection algorithm that separates each word from ambient noise, is developed. Then, an acoustic
model that gives a robust representation of each word at the training stage is derived. Finally, an appropriate classification algorithm
for the testing stage is selected. The speech-detection algorithm is developed by processing the prerecorded speech, frame by
frame, within a simple loop. To detect isolated digits, a combination of signal energy and zero-crossing counts for each speech
frame is used. Signal energy works well for detecting voiced signals, while zero-crossing counts work well for detecting unvoiced
signals. Calculating these metrics is simple using core MATLAB mathematical and logical operators. To avoid identifying ambient
noise as speech, it is assumed that each isolated word will last for certain time period. This can also be done in hardware, using
DSP module.
The Hardware Systems
DTMF
In DTMF[7],[8] there are 16 distinct tones. Each tone is the sum of two frequencies: one from a low and one from a high frequency
group. There are four different frequencies in each group. This system uses the same concept that is used in a telephone. IC 8807
is used for this, and it is called DTMF decoder. Along with this, SAPI (an API developed for speech recognition and speech
synthesis for windows) has to be used. Figure 4 shows DTMF decoder circuit.
IC HM2007
HM2007 [10] is a voice recognition chip, as shown in the figure 5, with on-chip analog front end, voice analysis, recognition
process and system control functions. The input voice command is analyzed, processed, recognized and then obtained at one of its
output port which is then decoded , amplified and given to the machine.
V. CONCLUSION
From the detail study of various voice recognition systems discussed above, it can be concluded that, although, speaker independent
systems are also available, they are costly. Thus, the voice recognition module VR 3, which is speaker dependent, is best suited,
for use in projects of making automated systems.
REFERENCES
[1] Jibran Abbasi, Muzamil Hussain, Shoaib Ahmed, An Implementation of Speech Recognition for Desktop Application, www.scribd.com
[2] Speech recognition-The next revolution,5th edition.
[3] Sameer Shewalkar, Shoaib Ansari, Masuma Mujawar, Prof.Patil S.S, Handling PC through Speech Recognition and Air Gesture International Journal of
Computer Science and Information Technology Research ,Vol. 3, Issue 1,January - March 2015
[4] Mark Gales Acoustic Modeling for Speech Recognition: Hidden Markov Models and Beyond? December 2009
[5] Charu Joshi, Speech Recognition, www.slideshare.net
[6] Developing an Isolated Word Recognition System in MATLAB, in.mathworks.com.
[7] Rachna Jain,Dr. S.K Saxena, Voice Automated MobileRobot,International Journal of Computer Applications Volume 16No.2, February 2011.
[8] Sija Gopinathan, Athira Krishnan R, Renu Tony, Vishnu M, Yedhukrishnan, Wireless Voice Controlled Fire Extinguisher Robot, International Journal of
Advanced Research in Electrical, Electronics and Instrumentation Engineering Vol. 4, Issue 4, April 2015.
[9] Madhavi Pednekar, Joel Amanna, Jino John, Abhishesh Singh, Suresh Prajapati, Don Bosco Institute of Technology, Mumbai, India, Voice Operated
Intelligent Fire Extinguishing Vehicle, 2015 International Conference on Technologies for Sustainable Development (ICTSD-2015), Feb. 04 06, 2015.
[10] Voice Controlled Robot, Engineering Degree by the University of Mumbai By Pratik Chopra Harshad Dange Under the guidance of Mr. Shirish S. Halbe
(Asst. Professor & Hobby Centre Co-ordinator ) Department of Electronics Engineering, K. J. Somaiya College of Engineering, Vidyavihar, 2006 (report).
[11] S.Suresh, Y. Sindhuja Rao, Modelling Of Secured Voice Recognition Based Automatic Control System, International Journal of Emerging Technology in
Computer Science & Electronics (IJETCSE), Volume 13 Issue 2 MARCH 2015