Speech Translation
Speech Translation
Speech Translation
How it works
A speech translation system would typically integrate the following three software
technologies: automatic speech recognition (ASR), machine translation (MT) and voice
synthesis (TTS).
The speaker of language A speaks into a microphone and the speech recognition module
recognizes the utterance. It compares the input with a phonological model, consisting of a
large corpus of speech data from multiple speakers. The input is then converted into a string of
words, using dictionary and grammar of language A, based on a massive corpus of text in
language A.
The machine translation module then translates this string. Early systems replaced every word
with a corresponding word in language B. Current systems do not use word-for-word translation,
but rather take into account the entire context of the input to generate the appropriate translation.
The generated translation utterance is sent to the speech synthesis module, which estimates the
pronunciation and intonation matching the string of words based on a corpus of speech data in
language B. Waveforms matching the text are selected from this database and the speech
synthesis connects and outputs them.[1]
History
In 1983, NEC Corporation demonstrated speech translation as a concept exhibit at the ITU
Telecom World (Telecom '83).[2]
In 1999, the C-Star-2 consortium demonstrated speech-to-speech translation of 5 languages
including English, Japanese, Italian, Korean, and German.[3][4]
Features
Apart from the problems involved in the text translation, it also has to deal with special problems
occur in speech-to-speech translation, incorporating incoherence of spoken language, fewer
grammar constraints of spoken language, unclear word boundary of spoken language, the
correction of speech recognition errors and multiple optional inputs. Additionally, speech-to-
speech translation also has its advantages compared with text translation, including less complex
structure of spoken language and less vocabulary in spoken language.
Standards
When many countries begin to research and develop speech translation, it will be necessary to
standardize interfaces and data formats to ensure that the systems are mutually compatible.
International joint research is being fostered by speech translation consortiums (e.g. the C-STAR
international consortium for joint research of speech translation and A-STAR for the Asia-
Pacific region). They were founded as "international joint-research organization[s] to design
formats of bilingual corpora that are essential to advance the research and development of this
technology ... and to standardize interfaces and data formats to connect speech translation
module internationally".[1]
Applications
Today, speech translation systems are being used throughout the world. Examples include
medical facilities, schools, police, hotels, retail stores, and factories. These systems are
applicable anywhere that spoken language is being used to communicate. A popular application
is Jibbigo that works offline.