Voice Recognition System

Journal for Research| Volume 03 | Issue 01 | March 2017
ISSN: 2395-7549
Voice Recognition System

Shruti Joshi Aarti Kumari
Student Student
Department of Computer Engineering Department of Computer Engineering
Don Bosco College of Engineering Goa, India Don Bosco College of Engineering Goa, India
Pooja Pai Saiesh Sangaonkar

Student Student
Department of Computer Engineering Department of Computer Engineering
Don Bosco College of Engineering Goa, India Don Bosco College of Engineering Goa, India
Prof. Melba DSouza

Assistant Professor
Department of Computer Engineering
Don Bosco College of Engineering Goa, India
Abstract
Voice recognition system is a system which is used to convert human voice into signal, which can be understood by the machines.
When this is achieved, the machine can be made to work, as desired. The machine could be a computer, a typewriter, or even a
robot. There are systems available, in which the machine speaks the recorded word. But that is out of the scope of this paper.
Here, only the human is expected to talk. Further, the voice recognition systems described here, can be used for projects only.
Keywords: Speech Recognition System, Acoustic Model, DTMF Decoder, HM 2007, Voice Recognition Moule VR3
_______________________________________________________________________________________________________
I. INTRODUCTION
The technical paper aims to explain various voice recognition systems, available. There are various software and hardware devices,
which use various techniques to decode human speech.
History
The concept of speech recognition started somewhere in 1940s.Practically the first speech recognition program appeared in 1952
at the bell labs[2],[3], that was about recognition of a digit in a noise free environment. Bell Laboratories designed in 1952 the
"Audrey" system, which recognized digits spoken by a single voice. This first speech recognition system, could understand only
digits. 1940s and 1950s is considered as the foundational period of the speech recognition technology. In this period, work was
done on the foundational paradigms of the speech recognition, which is, automation and information theoretic models.
Later, this device was improved to recognize spoken words, numbers etc. to obtain ASR(Automatic Speech Recognition)
system.
II. LITERATURE SURVEY
Types of Speech Recognition

Speech recognition systems can be divided into a number of classes based on their ability to recognize different words. A few
classes of speech recognition [1], [3], are classified as under:
Isolated Speech
Isolated words usually involve a pause between two utterances; it doesnt mean that, it only accepts a single word, but requires one
utterance at a time.
Connected Speech
Connected words or connected speech is similar to isolated speech, but allows separate utterances with minimal pauses between
them.
Continuous Speech
Continuous speech allows the user to speak almost naturally, and is also called computer dictation.
Spontaneous Speech
At a basic level, it can be thought of as speech, that is natural sounding and not rehearsed. An ASR system with spontaneous speech
ability should be able to handle a variety of natural speech features such as words being run together, "ums" and "ahs", and even
slight stutters.
All rights reserved by www.journal4research.org 6

(J4R/ Volume 03 / Issue 01 / 002)
III. BLOCK DIAGRAM
Modeling of the System

For a detailed understanding of the voice recognition system, consider the block diagram, shown in figure1.The voice input takes
in the spoken words. The A/D converter then decodes it. The modeling can then be done in two ways, as described below.
Fig. 1: General block diagram of speech recognition system
Acoustic Model
A model[1],[4] that is created by taking audio recordings of speech and their text transcriptions, and using a software to create
statistical representations of the sounds that make up each word.It is used by a speech recognition engine to recognize speech.The
software in this model, breaks the words into the phonemes.Phonemes are any of the perceptually distinct units of sound in a
specified language that distinguish one word from another. For example p, b, d,and t in the English words, pad, pat, bad, and bat.
Language Model
Language modeling[1] is used in many natural language processing applications such as speech recognition. It tries to capture the
properties of a language and to predict the next word in the speech sequence. The software of this model compares the phonemes
to words in its built in dictionary .But, as said earlier, this technical paper will not discuss this type of speech model.
IV. WORKING
The basic idea behind any speech recognition system is that, the speaker first records the text, desired to be recognized. The
recording is done through a microphone, connected to a mobile(in case a software is used), or the voice recognition device (if the
recognition hardware is used). This text is retrieved, when called for.
As said earlier, there are various speech recognition systems, few among which are discussed in this paper.
The software systems:
Visual Basic
This is a software[9] program, which is based on three labels: yes, no and may-be. These are initialized to large font and a light-
gray colour, as shown in figure 3.
Fig. 3: A window showing YES-NO recognizer
A reference to the System. Speech component is made, and the code is added. When the application starts the Windows, speech
recognition system will be loaded. After saying "Start Listening" or by clicking on the microphone icon, recognition starts.
Upon saying "yes", "no" or "maybe", the appropriate label lights up. And if anything else is said, the labels turn back to grey.
MATLAB
In this software [6], a word-detection algorithm that separates each word from ambient noise, is developed. Then, an acoustic
model that gives a robust representation of each word at the training stage is derived. Finally, an appropriate classification algorithm
for the testing stage is selected. The speech-detection algorithm is developed by processing the prerecorded speech, frame by
frame, within a simple loop. To detect isolated digits, a combination of signal energy and zero-crossing counts for each speech
frame is used. Signal energy works well for detecting voiced signals, while zero-crossing counts work well for detecting unvoiced

(J4R/ Volume 03 / Issue 01 / 002)
signals. Calculating these metrics is simple using core MATLAB mathematical and logical operators. To avoid identifying ambient
noise as speech, it is assumed that each isolated word will last for certain time period. This can also be done in hardware, using
DSP module.
The Hardware Systems
DTMF
In DTMF[7],[8] there are 16 distinct tones. Each tone is the sum of two frequencies: one from a low and one from a high frequency
group. There are four different frequencies in each group. This system uses the same concept that is used in a telephone. IC 8807
is used for this, and it is called DTMF decoder. Along with this, SAPI (an API developed for speech recognition and speech
synthesis for windows) has to be used. Figure 4 shows DTMF decoder circuit.
Fig. 4: DTMF decoder circuit using IC 8870
IC HM2007
HM2007 [10] is a voice recognition chip, as shown in the figure 5, with on-chip analog front end, voice analysis, recognition
process and system control functions. The input voice command is analyzed, processed, recognized and then obtained at one of its
output port which is then decoded , amplified and given to the machine.
Fig. 5: Voice recognition circuit using IC HM2007
Voice Recognition System VR 3[11]

This is a compact and easy-control speaking recognition board, which is shown in figure 6. This product is a speaker-dependent
voice recognition module. It supports up to 80 voice commands in all. Max 7 voice commands could work at the same time. Any
sound could be trained as command. Users need to train the module first, before let it recognizing any voice command. This board
has 2 controlling ways: Serial Port (full function), General Input Pins (part of function). General Output Pins on the board could
generate several kinds of waves while corresponding voice command is being recognized. This module is arduino compatible.

(J4R/ Volume 03 / Issue 01 / 002)
Fig. 6: The voice recognition module VR 3
V. CONCLUSION
From the detail study of various voice recognition systems discussed above, it can be concluded that, although, speaker independent
systems are also available, they are costly. Thus, the voice recognition module VR 3, which is speaker dependent, is best suited,
for use in projects of making automated systems.
REFERENCES
[1] Jibran Abbasi, Muzamil Hussain, Shoaib Ahmed, An Implementation of Speech Recognition for Desktop Application, www.scribd.com
[2] Speech recognition-The next revolution,5th edition.
[3] Sameer Shewalkar, Shoaib Ansari, Masuma Mujawar, Prof.Patil S.S, Handling PC through Speech Recognition and Air Gesture International Journal of
Computer Science and Information Technology Research ,Vol. 3, Issue 1,January - March 2015
[4] Mark Gales Acoustic Modeling for Speech Recognition: Hidden Markov Models and Beyond? December 2009
[5] Charu Joshi, Speech Recognition, www.slideshare.net
[6] Developing an Isolated Word Recognition System in MATLAB, in.mathworks.com.
[7] Rachna Jain,Dr. S.K Saxena, Voice Automated MobileRobot,International Journal of Computer Applications Volume 16No.2, February 2011.
[8] Sija Gopinathan, Athira Krishnan R, Renu Tony, Vishnu M, Yedhukrishnan, Wireless Voice Controlled Fire Extinguisher Robot, International Journal of
Advanced Research in Electrical, Electronics and Instrumentation Engineering Vol. 4, Issue 4, April 2015.
[9] Madhavi Pednekar, Joel Amanna, Jino John, Abhishesh Singh, Suresh Prajapati, Don Bosco Institute of Technology, Mumbai, India, Voice Operated
Intelligent Fire Extinguishing Vehicle, 2015 International Conference on Technologies for Sustainable Development (ICTSD-2015), Feb. 04 06, 2015.
[10] Voice Controlled Robot, Engineering Degree by the University of Mumbai By Pratik Chopra Harshad Dange Under the guidance of Mr. Shirish S. Halbe
(Asst. Professor & Hobby Centre Co-ordinator ) Department of Electronics Engineering, K. J. Somaiya College of Engineering, Vidyavihar, 2006 (report).
[11] S.Suresh, Y. Sindhuja Rao, Modelling Of Secured Voice Recognition Based Automatic Control System, International Journal of Emerging Technology in
Computer Science & Electronics (IJETCSE), Volume 13 Issue 2 MARCH 2015

Voice Recognition System

Uploaded by

Copyright:

Available Formats

Voice Recognition System

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Voice Recognition System

Uploaded by

Copyright:

Available Formats

Journal for Research| Volume 03 | Issue 01 | March 2017

Voice Recognition System

Pooja Pai Saiesh Sangaonkar

Prof. Melba DSouza

II. LITERATURE SURVEY

Types of Speech Recognition

All rights reserved by www.journal4research.org 6

III. BLOCK DIAGRAM

Modeling of the System

Fig. 1: General block diagram of speech recognition system

Fig. 3: A window showing YES-NO recognizer

All rights reserved by www.journal4research.org 7

Fig. 4: DTMF decoder circuit using IC 8870

Fig. 5: Voice recognition circuit using IC HM2007

Voice Recognition System VR 3[11]

All rights reserved by www.journal4research.org 8

Fig. 6: The voice recognition module VR 3

All rights reserved by www.journal4research.org 9

You might also like