Vocoder

Vocoder
From Wikipedia, the free encyclopedia

This article needs additional citations for verification. Please help improve
this article by adding citations to reliable sources. Unsourced material may be
challenged and removed. (May 2013)
A vocoder (/vokodr/, short for voice encoder) is an analysis/synthesis system, used to reproduce
human speech. In the encoder, the input is passed through a multiband filter, each band is passed through
an envelope follower, and the control signals from the envelope followers are communicated to the
decoder. The decoder applies these (amplitude) control signals to corresponding filters in the synthesizer.
Since the control signals change only slowly compared to the original speech waveform,
the bandwidth required to transmit speech can be reduced. This allows more speech channels to share a
radio circuit or submarine cable. By encrypting the control signals, voice transmission can be secured
against interception.
The vocoder was originally developed as a speech coder for telecommunications applications in the 1930s,
the idea being to code speech for transmission. Transmitting the parameters of a speech model instead of
a digitized representation of the speech waveform saves bandwidth in the communication channel; the
parameters of the model change relatively slowly, compared to the changes in the speech waveform that
they describe. Its primary use in this fashion is for secure radio communication, where voice has to
be encrypted and then transmitted. The advantage of this method of "encryption" is that no "signal" is sent,
but rather envelopes of the bandpass filters. The receiving unit needs to be set up in the same channel
configuration to resynthesize a version of the original signal spectrum. The vocoder as both hardware
and software has also been used extensively as an electronic musical instrument.
Whereas the vocoder analyzes speech, transforms it into electronically transmitted information, and
recreates it, the Voder (from Voice Operating Demonstrator) generates synthesized speech by means of a
console with fifteen touch-sensitive keys and a pedal. It basically consists of the "second half" of the
vocoder, but with manual filter controls, and requires a highly trained operator.
[1][2]

Early 1970s vocoder, custom built for electronic music band Kraftwerk
Contents
[hide]
1 Theory
2 History
3 Applications
4 Modern implementations
o 4.1 Linear prediction-based
o 4.2 Waveform-Interpolative
5 Artistic effects
o 5.1 Uses in music
o 5.2 Voice effects in other arts
6 See also
7 References
8 External links
Theory[edit]
The human voice consists of sounds generated by the opening and closing of the glottis by the vocal cords,
which produces a periodic waveform with many harmonics. This basic sound is thenfiltered by the nose
and throat (a complicated resonant piping system) to produce differences in harmonic content (formants) in
a controlled way, creating the wide variety of sounds used in speech. There is another set of sounds,
known as the unvoiced and plosive sounds, which are created or modified by the mouth in different
fashions.
The vocoder examines speech by measuring how its spectral characteristics change over time. This results
in a series of signals representing these modified frequencies at any particular time as the user speaks. In
simple terms, the signal is split into a number of frequency bands (the larger this number, the more
accurate the analysis) and the level of signal present at each frequency band gives the instantaneous
representation of the spectral energy content. Thus, the vocoder dramatically reduces the amount of
information needed to store speech, from a complete recording to a series of numbers. To recreate speech,
the vocoder simply reverses the process, processing a broadband noise source by passing it through a
stage that filters the frequency content based on the originally recorded series of numbers. Information
about the instantaneous frequency (as distinct from spectral characteristic) of the original voice signal is
discarded; it wasn't important to preserve this for the purposes of the vocoder's original use as an
encryption aid, and it is this "dehumanizing" quality of the vocoding process that has made it useful in
creating special voice effects in popular music and audio entertainment.
Since the vocoder process sends only the parameters of the vocal model over the communication link,
instead of a point by point recreation of the waveform, it allows a significant reduction in the bandwidth
required to transmit speech.

Channel vocoder schematic
Analog vocoders typically analyze an incoming signal by splitting the signal into a number of tuned
frequency bands or ranges. A modulator and carrier signal are sent through a series of these
tuned bandpass filters. In the example of a typical robot voice the modulator is a microphone and the
carrier is noise or a sawtooth waveform. There are usually between 8 and 20 bands.
The amplitude of the modulator for each of the individual analysis bands generates a voltage that is used to
control amplifiers for each of the corresponding carrier bands. The result is that frequency components of
the modulating signal are mapped onto the carrier signal as discrete amplitude changes in each of the
frequency bands.
Often there is an unvoiced band or sibilance channel. This is for frequencies outside of analysis bands for
typical speech but still important in speech. Examples are words that start with the letters s, f, ch or any
other sibilant sound. These can be mixed with the carrier output to increase clarity. The result is
recognizable speech, although somewhat "mechanical" sounding. Vocoders also often include a second
system for generating unvoiced sounds, using a noise generator instead of the fundamental frequency.
History[edit]

SIGSALY (19431946) speech encipherment system

HY-2 Vocoder (designed in 1961), was the last generation of channel vocoder in the US.
[3]

The first experiments with a vocoder were conducted in 1928 by Bell Labs engineer Homer Dudley, who
was granted a patent for it on March 21, 1939.
[4]
The Voder (Voice Operating Demonstrator), was
introduced to the public at the AT&T building at the 19391940 New York World's Fair.
[2]
The Voder
consisted of a series of manually controlled oscillators, filters, and a noise source. The filters were
controlled by a set of keys and a foot pedal to convert the hisses and tones into vowels, consonants, and
inflections. This was a complex machine to operate, but with a skilled operator could produce recognizable
speech.
[2][media 1]

Dudley's vocoder was used in the SIGSALY system, which was built by Bell Labs engineers in 1943.
SIGSALY was used for encrypted high-level voice communications during World War II. Later work in this
field has been conducted by James Flanagan.
Applications[edit]
Terminal equipment for Digital Mobile Radio (DMR) based systems.
Digital Trunking
DMR TDMA
Digital Voice Scrambling and Encryption
Digital WLL
Voice Storage and Playback Systems
Messaging Systems
VoIP Systems
Voice Pagers
Regenerative Digital Voice Repeaters
Cochlear Implants
Musical and other artistic effects
Modern implementations[edit]
Main articles: Speech codec and Audio codec
Even with the need to record several frequencies, and additional unvoiced sounds, the compression of
vocoder systems is impressive. Standard speech-recording systems capture frequencies from about
500 Hz to 3400 Hz, where most of the frequencies used in speech lie, typically using a sampling rate of
8 kHz (slightly greater than the Nyquist rate). The sampling resolution is typically at least 12 or more bits
per sample resolution (16 is standard), for a final data rate in the range of 96128 kbit/s, but a good
vocoder can provide a reasonably good simulation of voice with as little as 2.4 kbit/s of data.
'Toll Quality' voice coders, such as ITU G.729, are used in many telephone networks. G.729 in particular
has a final data rate of 8 kbit/s with superb voice quality. G.723 achieves slightly worse quality at data rates
of 5.3 kbit/s and 6.4 kbit/s. Many voice vocoder systems use lower data rates, but below 5 kbit/s voice
quality begins to drop rapidly.
Several vocoder systems are used in NSA encryption systems:
LPC-10, FIPS Pub 137, 2400 bit/s, which uses linear predictive coding
Code-excited linear prediction (CELP), 2400 and 4800 bit/s, Federal Standard 1016, used in STU-III
Continuously variable slope delta modulation (CVSD), 16 kbit/s, used in wide band encryptors such as
the KY-57.
Mixed-excitation linear prediction (MELP), MIL STD 3005, 2400 bit/s, used in the Future Narrowband
Digital Terminal FNBDT, NSA's 21st century secure telephone.
Adaptive Differential Pulse Code Modulation (ADPCM), former ITU-T G.721, 32 kbit/s used
in STE secure telephone
(ADPCM is not a proper vocoder but rather a waveform codec. ITU has gathered G.721 along with some
other ADPCM codecs into G.726.)
Vocoders are also currently used in developing psychophysics, linguistics, computational neuroscience and
cochlear implant research.
Modern vocoders that are used in communication equipment and in voice storage devices today are based
on the following algorithms:
Algebraic code-excited linear prediction (ACELP 4.7 kbit/s 24 kbit/s)
[5]

Mixed-excitation linear prediction (MELPe 2400, 1200 and 600 bit/s)
[6]

Multi-band excitation (AMBE 2000 bit/s 9600 bit/s)
[7]

Sinusoidal-Pulsed Representation (SPR 600 bit/s 4800 bit/s)
[8]

Robust Advanced Low-complexity Waveform Interpolation (RALCWI 2050bit/s, 2400bit/s and
2750bit/s)
[9]

Tri-Wave Excited Linear Prediction (TWELP 600 bit/s 9600 bit/s)
[10]

Noise Robust Vocoder (NRV 300 bit/s and 800 bit/s)
[11]

Linear prediction-based[edit]
Main article: Linear predictive coding
Since the late 1970s, most non-musical vocoders have been implemented using linear prediction, whereby
the target signal's spectral envelope (formant) is estimated by an all-pole IIR filter. In linear prediction
coding, the all-pole filter replaces the bandpass filter bank of its predecessor and is used at the encoder
to whiten the signal (i.e., flatten the spectrum) and again at the decoder to re-apply the spectral shape of
the target speech signal.
One advantage of this type of filtering is that the location of the linear predictor's spectral peaks is entirely
determined by the target signal, and can be as precise as allowed by the time period to be filtered. This is
in contrast with vocoders realized using fixed-width filter banks, where spectral peaks can generally only be
determined to be within the scope of a given frequency band. LP filtering also has disadvantages in that
signals with a large number of constituent frequencies may exceed the number of frequencies that can be
represented by the linear prediction filter. This restriction is the primary reason that LP coding is almost
always used in tandem with other methods in high-compression voice coders.
Waveform-Interpolative[edit]
Waveform-Interpolative (WI) vocoder was developed in AT&T Bell Laboratories around 1995 by W.B.
Kleijn, and subsequently a low- complexity version was developed by AT&T for the DoD secure vocoder
competition. Notable enhancements to the WI coder were made at the University of California, Santa
Barbara. AT&T holds the core patents related to WI, and other institutes hold additional patents. Using
these patents as a part of WI coder implementation requires licensing from all IPR holders.
[12][13][14]

Artistic effects[edit]
See also: List of vocoders
Uses in music[edit]
Main article: Synthesizer
For musical applications, a source of musical sounds is used as the carrier, instead of extracting the
fundamental frequency. For instance, one could use the sound of a synthesizer as the input to the filter
bank, a technique that became popular in the 1970s.
Werner Meyer-Eppler, a German scientist with a special interest in electronic voice synthesis, published a
thesis in 1948 on electronic music and speech synthesis from the viewpoint of sound synthesis,
[15]
and was
instrumental in the founding in 1951 of a studio for electronic music at the WDR radio station in Cologne.
[16]

Siemens Synthesizer (c.1959) atSiemens Studio for Electronic Music was one of the first uses of a vocoder to create
music
One of the first uses of a vocoder to create music was using the Siemens Synthesizer at the Siemens
Studio for Electronic Music, developed between 19561959.
[17][media 2]

In 1968, Robert Moog developed one of the first solid-state musical vocoders for the electronic music
studio of University at Buffalo.
[18]

In 1968, Bruce Haack built a prototype vocoder, named "Farad" after Michael Faraday,
[19]
and it was first
featured on "The Electronic Record For Children" released in 1969 and then on his rock album The Electric
Lucifer released in 1970.
[20][media 3]

In 1970 Wendy Carlos and Robert Moog built another musical vocoder, a 10-band device inspired by the
vocoder designs of Homer Dudley. It was originally called a spectrum encoder-decoder, and later referred
to simply as a vocoder. The carrier signal came from a Moog modular synthesizer, and the modulator from
a microphone input. The output of the 10-band vocoder was fairly intelligible, but relied on specially
articulated speech. Later improved vocoders use a high-pass filter to let some sibilance through from the
microphone; this ruins the device for its original speech-coding application, but it makes the "talking
synthesizer" effect much more intelligible.
Carlos and Moog's vocoder was featured in several recordings, including the soundtrack to Stanley
Kubrick's A Clockwork Orange in which the vocoder sang the vocal part of Beethoven's "Ninth Symphony".
Also featured in the soundtrack was a piece called "Timesteps," which featured the vocoder in two
sections. "Timesteps" was originally intended as merely an introduction to vocoders for the "timid listener",
but Kubrick chose to include the piece on the soundtrack, much to the surprise of Wendy Carlos.
[citation needed]

In 1972, Isao Tomita's first electronic music album Electric Samurai: Switched on Rock was an early
attempt at applying speech synthesis technique through a vocoder
[citation needed]
in electronic rock and pop
music. The album featured electronic renditions of contemporary rock and pop songs, while utilizing
synthesized voices in place of human voices. In 1974, he utilized synthesized voices again in his
popular classical music album Snowflakes are Dancing, which became a worldwide success and helped
popularize electronic music.
[21]

Kraftwerk's Autobahn (1974) was one of the first successful albums to feature vocoder vocals. Another of
the early songs to feature a vocoder was "The Raven" on the 1976 album Tales of Mystery and
Imagination by progressive rock band The Alan Parsons Project; the vocoder also was used on later
albums such as I Robot. Following Alan Parsons' example, vocoders began to appear inpop music in the
late 1970s, for example, on disco recordings. Jeff Lynne of Electric Light Orchestra used the vocoder in
several albums such as Time (featuring the Roland VP-330 Plus MkI). ELO songs such as "Mr. Blue Sky"
and "Sweet Talkin' Woman" both from Out of the Blue (1977) use the vocoder extensively. Featured on the
album are the EMS Vocoder 2000W MkI, and the EMS Vocoder (-System) 2000 (W or B, MkI or II).

"Mr. Blue Sky" by the
Electric Light Orchestra
(1977)

MENU
0:00
Classic example of a singing
vocoded voice.

Problems playing this file? See media help.
Giorgio Moroder made extensive use of the vocoder on the 1975 album Einzelganger and on the 1977
album From Here to Eternity.
Another example is Pink Floyd's album Animals (1977), where the band put the sound of a barking dog
through the device.
The vocoder has been used at the start and end of the Main Street Electrical Parade at Disneyland and
Walt Disney world since 1979.
Vocoders are often used to create the sound of a robot talking, as in the Styx song "Mr. Roboto" (1983).
Vocoders have appeared on pop recordings from time to time ever since, most often simply as a special
effect rather than a featured aspect of the work. However, many experimental electronic artists of the New
Age music genre often utilize vocoder in a more comprehensive manner in specific works, such as Jean
Michel Jarre(on Zoolook, 1984) and Mike Oldfield (on QE2, 1980 and Five Miles Out, 1982). There are also
some artists who have made vocoders an essential part of their music, overall or during an extended
phase. Examples include the German synthpop group Kraftwerk, Stevie Wonder ("Send One Your Love",
"A Seed's a Star") and jazz/fusion keyboardist Herbie Hancock during his late 1970s period. In 1982 Neil
Young used a Sennheiser Vocoder VSM201 on six of the nine tracks on Trans.
[22]
Tommy James used a
Vocoder in the production of his group's (the Shondells) 1968 number one hit 'Crimson and Clover'.
[citation
needed]

Perhaps the most heard, yet often unrecognized, example of the use of a Vocoder in popular music, is
on Michael Jackson's 1982 Album, "Thriller" in the Song "PYT" "Pretty Young Thing". During the first few
seconds of the song, the background voicings "ooh-ooh, ooh, ooh, behind his spoken words, exemplify the
heavily modulated sound of his voice through a Vocoder.
[23]
The bridge also features vocoder as well
("Pretty Young Thing / You Make Me Sing), courtesy of session musician Michael Boddicker.
Among the most consistent uses of vocoder in emulating the human voice are Daft Punk, who have used
this instrument from their first album Homework (1997) to their latest one Random Access Memories (2013)
and consider the convergence of technological and human voice "the identity of their musical
project".
[24]
For instance, the lyrics of Around the World (1997) are integrally vocoder-processed, and Get
Lucky (2013) features a mix of human and processed voice.
Voice effects in other arts[edit]
See also: Robotic voice effects, Talk box, and Auto-Tune
"Robot voices" became a recurring element in popular music during the 20th century. Apart from vocoders,
several other methods of producing variations on this effect include: the Sonovox, Talk box, and Auto-
Tune,
[media 4]
linear prediction vocoders, speech synthesis,
[media 5][media 6]
ring modulation and comb filter.

Example of vocoder

MENU
0:00
Demonstration of the "robotic
voice" effect found in film
and television.

Problems playing this file? See media help.
Vocoders are used in television production, filmmaking and games, usually for robots or talking computers.
The robot voices of the Cylonsin Battlestar Galactica were created with an EMS Vocoder
2000.
[22]
The 1980 version of the Doctor Who theme, as arranged and recorded by Peter Howell, has a
section of the main melody generated by a Roland SVC-350 Vocoder. A vocoder was also used to create
the iconic voice of Soundwave, a character from the Transformers series.

Vocoder

Uploaded by

Copyright:

Available Formats

Vocoder

Uploaded by

Document Information

Original Description:

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Vocoder

Uploaded by

Copyright:

Available Formats

How does a vocoder work?

What are some applications of vocoders?

Vocoder

From Wikipedia, the free encyclopedia

You might also like