Acoustic Phonetics PDF

Download as pdf or txt
Download as pdf or txt
You are on page 1of 82

Acoustic phonetics

The science to describe

sound is known as
The study of the physical
properties of sound waves.

Acoustic phonetics is concerned

with describing the different
kinds of acoustic signal that the
movement of the vocal organs
gives rise to in the production of

Speech Production system


1.The system below the larynx

2.The larynx and the
surrounding structure.
3.The structure and the
airways above the larynx


The trachea:
2.5cm2 cross-sectional area 10-12 cm
in length (adults)
The bronchi:
Alveolar sacs:
lies within the lungs
Lungs:vital capacity 3000-5000 cm3
maximum range of lung
volume available excursions
during normal breathing 1000

The principal muscles for inspiration
: contraction lowers the
: contraction raises
2. External intercostals
the ribcage

The principal muscles for expiration are:
1. Internal intercostals
pulls the

ribcage downwards
2. Abdominal muscles
The elastic recoil of the lungs always
contributes an expiratory force, but this
force is augmented or reduced by the
action of expiratory or inspiratory muscles.

The principal structure in the larynx that
play a direct role in the production of
speech are the vocal folds.
2 bands or cordlike
Vocal folds:

segments of tissue

Length: 1.0 to 1.5 cm

thickness 2 to 3 mm
roughly parallel to each other in an
antero-posterior direction.

Vocal folds and ventricular folds are

adducted(approximated) or
abducted(separated) leaving a
space between the vocal folds.

Sound is a pattern of pressure variation
that moves in wave from a source.
Sound waves are the means of acoustic
energy transmission between a sound
source and a sound receiver.
Pressure fluctuations move through
space but each particle moves only a
small distance.

Sound is experienced when

pressure fluctuations reach the
eardrum and the auditory system
translate these movement into
neural impulses.
Sound waves perceived by
human ears range from
20Pascal to 20Pascal.

Propagation of sound
A sound produced at a source sets up
a sound wave that travel through the
acoustic medium.
Sound waves are small differences in
air pressure which diffuses in all
An acoustic waveform is a record of
sound-producing pressure fluctuation
over time.

Types of waves
Transverse Wave:
In a transverse wave the motion of
the individual particles is
perpendicular to the motion of the
e.g. the mexican wave.
Longitudinal Wave:
The motion of the individual particles
is parallel to the motion of the wave.
e.g. sound waves.

Sound wave consist of air pressure

The speed at which these air
pressure variations spread through a
space is called the speed of sound.
The speed of sound depends on the
density and the elasticity of the
The speed of sound is around 344m/s
in dry air of 21 degree Celsius.

Types of sound
1. Periodic sound: periodic sounds have
a pattern that repeats at regular
2. Aperiodic sound: aperiodic sound do
not have a regularly repeating
pattern; they have either a random
waveform or a pattern that doesnt

Periodic Sounds

1.Simple periodic sounds.

2.Complex periodic

Simple periodic waves

Simple periodic waves are also called
sine waves.
The name comes from a wave of this
shape graphs the geometric sin
function of an angle as it moves from
0 to 360 (one cycle). A cosine wave
has the same shape as a sine wave,
but begins at the maximum value (1)
rather than 0.

Any wave that

has the shape

of a sine wave,
regardless of
differences in
phase, is called
a sinusoidal

They result
from simple

Representation where the sound

pressure is plotted vertically against
a horizontal time axis, is called an
oscillogram or waveform.

In order to define a sine wave, we need to

know three principal dimensions


What is frequency?
The number of times the sinusoidal
pattern repeats per unit.
Each repetition of the pattern is
called a cycle.
The duration of a cycle is its period.
Frequency is expressed as cycle per
second, which by convention is called
Hertz (Hz)




How do we get the frequency of a

sine wave in Hertz.
Divide one second by the period(the
duration of one cycle)

f= 1/T, where T is the

period in seconds

The displacement of the vibrating
medium from its rest position.
The maximal displacement from the
zero line is known as amplitude.
It shows the vertical range of the

The distance between a maximum

and the next minimum is called
peak-to-peak amplitude.
The higher the peak-to-peak the
difference between the air pressure
maxima and the air pressure minima
is larger.
This means that the acoustic signal is
perceived as being louder.
Amplitude is measured in terms of

Damping: The gradual loss of

energy(and amplitude) from cycle to
cycle is known as damping.

Phase: The exact position of a
specific point in a waveform.
It is measured in terms of degrees.

Why frequency, amplitude important

for acoustic phonetics?
Any oscillating system whose period
and velocity have the inverse
relationship defined above and
captured by the sine waves are
simple harmonic motion.
Systems that oscillate in Simple
harmonic motion produce a simple

The mathematics of sinusoidal motions

are well understood.
Sinusoidal waves can be described in
terms of their frequency, the amplitude
and the phase.
Phase is not usually that important for
speech analysis.
So, if we know the frequency and
amplitude of a sinusoid we know
everything important there is to know
about anything vibrating in simple
harmonic motion.

French mathematician Jean Baptiste

Joseph Fourier proved in 1807 that
every kind of vibration(including all
complex speech sound) can be
described as the sum of a set of
simple sinusoid of varying
frequencies and amplitude.
An understanding of sinusoidal
motion defined by frequency and
amplitude is the key to
understanding all speech sounds.

Complex periodic waves

The result of adding sinusoids or simple
periodic waves is a complex wave.
Complex waves are not sinusoidal
itself, but it is periodic.
As complex waves is made up of some
numbers of component frequencies,
the basic frequency, the rate at which
the whole patterns repeats, is called
fundamental frequency(F0)

F0 determines the pitch of a sound

The loudness of the sound depends
on both frequency and amplitude.
Given a F0, greater the overall
amplitude, the louder the sound.
The component frequencies are
called harmonics.
The different frequencies and
amplitudes of the component
harmonics give the sound its quality.

The fundamental frequency is always

equal to the greatest common factor
of the complex frequency.
Component waves of 50Hz, 150Hz,
and 250Hz will have a F0 of 50Hz.

Aperiodic waves
The moment-to-moment pressure
variation are random, there are no
repeating pattern.
A special category of aperiodic sound
is transient.
Transient sounds are instantaneous,
there is a momentary disturbance,
not drawn out or repeated.
e.g. knock on the table, slamming of
a door

The reinforcement or prolongation of
sound by reflection from a surface or
by the synchronous vibration of a
neighboring object.
Natural resonant frequency : Every
object has a basic frequency, or a set
of frequencies at which it will
naturally oscillate when energy is

If an input frequency is synchronized

with the natural frequency of any object,
the two system are in resonance.
When energy is applied in resonance
with a natural frequency, the amplitude
of movement at that frequency is
increased, because the two forces are
acting together.
When energy is applied that is not in
resonance with a natural frequency, that
energy is quickly dissipated because the
forces are cancelling each other out, and
amplitude at that frequency dies out.

Objects do not vibrate freely; they

are tuned to resonate only to a
narrow frequency band.
If the frequency of the sound from a
source happens to match the natural
resonant frequency of the object, the
object will vibrate in resonance with
the sound, passing along the pattern
of vibration at a high amplitude;
otherwise the sound energy
dissipates and the vibration dies out.

The resonating body thus acts as a

filter, allowing only some
frequencies to get through: resonant
frequencies are amplified and the
other frequencies are lost.

The Vocal tract as a sound producing

device: source-filter theory
The vocal tract is a resonating
In a vowel sound the vibrating vocal
folds provides the driving force,
which induces resonance in the air
trapped in the vocal tract.
The energy is output as sound.
This is known as the source-filter
theory of speech sounds.

Vocal tract sound source may be

periodic or aperiodic.
The vibrating vocal folds provide a
periodic source, which dominates in
An aperiodic source is most
important for Obstruents.
The turbulence created by a fricative,
aspiration is sustained aperiodic
The release burst of a stop is a

Given the right amount of tension and
the right amount of egressive airstream,
the vocal folds vibrate.
The opening and closing of the vocal
folds in the air column provides
repeated burst of air pressure.
The complex vibration of the vocal folds
provide rich source and generate
waveforms composed of multiple
harmonic frequencies.

The complex movements of the vocal

folds leads to complex signals, which
carries frequencies far above the
fundamental frequency.
This richness of the source signal
allows us to produce many different
speech sounds from the same source
signal by filtering it with the vocal

Vocal Tract Filter

The vocal tract can be approximated
by a cylindrical tube, which is open
at one side(the pips) and virtually
closed at the glottis.
The length and width of the tube
determines the acoustic properties of
the tube.
The bending of the tube has little
effect on the acoustic properties

The resonance frequencies of the

vocal tracts are very important and are
called the formant frequencies.
The formant frequencies are numbered
and are named F1, F2, F3, etc.
The numbering of the formant
frequencies have nothing to do with
the fundamental frequency.
F0 is the property of the vocal fold
vibration (the voice source) and the
formant frequencies (F1, F2, F3,) are
properties of the vocal tract(the filter)

Formants are the property of the vocal
tract itself, independent of whether a
laryngeal source signal is present or not.
The shape of the vocal tract determines
the formants, whether there is a source
signal or not.
Formant frequencies do not always
corresponds to the harmonics of the
laryngeal signal.

The position of the articulators

determine the location of the formants.
Since, the formant frequencies of the
depend on the vocal tract, it is possible
to formulate some general rules about
how the position of the articulators
influences the formant frequencies on
the basis of perturbation theory
(Chiba and Kajiyama, 1941).
Perturbation theory is a way to compute
(explain) whether the resonance
frequency for an arbitrarily constricted
tube are higher or lower than those for
unconstricted cylindrical tube.

As a rule of thumb, low vowels in the

vowel quadrilateral have a high F1 and
high vowels have a low.
Similarly, front vowels have a high F2
and back vowels have a low F2.
The terms low, high, front and back
refer to positions in the vowel
quadrilateral that reflect idealized
tongue positions, i.e. an articulatory
Formant frequency values can serve as
a basis for a rough classification of
different vowels.

It holds across different speakers,

languages, and dialects (Peterson &
Barney, 1952)

Acoustics of vowels

F1 correlates with size of pharyngeal cavity and

degree of lip opening(when the tongue high,
the pharyngeal cavity is larger, as in [i],
resulting in lower F1) -- Vowel openness or
F2 correlates with the length of the oral cavity
-- frontness/backness(the longer the oral cavity
- due to the more retracted tongue - the lower
Lip rounding protracts the oral cavity and thus
will decrease F2

Identifying vowel quality

based on formant frequencies
F1 is inversely related to height
F2 is related to frontness/backness
(Lip) rounding lowers formant
values (esp. F2)

Acoustic property of

Four acoustic properties of plosives

1.Duration of stop gap silent

period in the closure phase
i.e. the closure duration of /p,t,k/ are
longer than /b,d,g/
2.Voicing bar a dark bar that is
shown at the low frequencies and its
usually below 200Hz
i.e. only for voiced plosives /b,d,g/ , which
is a primary indicator of voicing in the
spectrogram, and all kinds of voiced
sounds, including vowels, show this voicing
bar at such low frequencies

3.Release burst a strong

vertical spike
i.e. In general, we observe a stronger
spike for /p, t, k/ than for /b, d, g/
4.Aspiration a short
frication noise before vowel
formants begin and it isusually
in 30ms
i.e./p,t,k/ of stressed syllable in
initial position e.g. /ph/in pin.

Aspiration is not the same as the release

burst. The period of aspiration (which only
some voiceless plosives have) is much
longer than the very short release burst
(which all released plosives have).
High-intensity noise of / p / and /
b /appears in the range of 3,000-5,000Hz

Voiced stops identified using formant


Voiced stops identified using formant


Fricatives can be divided
Sibilants include [s,, z,]. Sibilants
involve a turbulent airstream that
strikes an obstacle, such as the teeth.
non-sibilants involve turbulence at the
site of constriction sibilants tend to be
louder than non-sibilants.

Most of their acoustic energy occurs at

higher frequencies, e.g. the bulk of the
turbulence of both /s/ and /z/occurs above
3500Hz, and reaching as high as 10,000
Hz, and // has most of its acoustic energy
from around 2000 Hz up to 10,000 Hz.
Voicedfricativesshow aspects of both
regular vocal fold vibrations and a
randomly turbulent airstream. Different
from their voiceless counterparts, the
voiced fricatives have a substantial voicing
bar occupying approximately the lower
400 Hz.

The typical properties of /f/ include

high frequency turbulence
concentratedbetween 30004000Hz.
The voiced labiodental fricative /v/
also shows high frequency
turbulence focused above 4,000 Hz,
but it is not stronger than /f./
There is no voicing bar with /f/, but
there is a substantial voicing bar of
/v/ occupying approximately the
lower 400 Hz.

fricative / h /, is voiceless.There is no
voicing bar for /h/, and its turbulence
appearsto be strongest around1000 Hz.

Like vowels, approximants are
highly resonant
produced with a relatively open vocal tract
characterised by identifiable formant
continuant sounds since there is no occlusion
or momentary stoppage of the airstream
non-turbulent due to lack of constriction
oral sounds

They have faint formant structures

that they all have a low F1(below
1000Hz) as they are voiced
/w/, a large downward transition of
F2 is characteristic due to the back
tongue constriction.
Lip rounding lowers the intensity of
all formants particularly F3.
So /w/ has F1 (250-450Hz), F2 (600 850Hz), and F3 (2000 - 2400Hz).

/j/, the tongue is in the position for a

front half close to close vowel
(depending on the degree of
openness of the following sound).
Therefore it has a similar formant
pattern to /i/.
Lips are neutral to spread but
rounded in anticipation of round
vowels. It has a low F1 (200 - 300Hz)
and a high F2 (1850 - 2100Hz) and
F3 (2620 - 3050Hz)

/r/ is characterized by very low F3

due to retroflex articulation, which is
usually below 2000Hz, sometimes,
falling to as low as 1500Hz.
The frequency of F1 appears to be
related to lip rounding. i.e. low F1 =
lip round
/r/ normally has F1 (300-350Hz), F2
(1000-1200Hz) and F3(1600

For/l/, F1 is low and there is no

continuous transition at vowel
F1 approx 200 - 400 Hz - F1 rises to
all vowel targets except high front.
F2 approx 950-1500Hz (lowest for
back vowels)
F3 approx 2700-3200.


The formants of all these three nasals

are not as dark as they are in vowels.
The frequency of F1 is very low (200450 Hz) and the F3 is more visible
(2500Hz). F2 is generally not visible
[m] shows a fairly level F1 with a
downward sloping F2.
[n] shows a downward slope for both
F1 and F2.
[] shows an upward direction for F2
and a downward direction for F3.

You might also like