Audio Theory
Audio Theory
Audio Theory
Figure 1
M.Y.A Analog Audio
An Analog Audio signal can be graphically represented as a waveform (see Figure 2).
A waveform is made up of peaks and troughs that are a visual representation of
wavelength, period, amplitude (volume level) and frequency (pitch).
The horizontal distance between two successive points on the wave is referred to as the
wavelength and is the length of one cycle of a wave. A cycle is the distance between two
peaks. The period of the wave refers to the amount of time that it takes for a wave to
travel one wavelength (see Figure 6 on the next page).
Amplitude is half the distance from the highest to the lowest point in a wave.
If the distance is large the volume level is comparatively loud and alternately, if the wave
is small, then the volume level is low.
The number of cycles per second is referred to as frequency (see Figure 3).
Frequency is measured in hertz (Hz), a unit of measure named after Heinrich Hertz, a German
physicist and indicates the number of cycles per second that pass a specified location in the waveform.
M.Y.A Frequency directly relates to the sound’s pitch. The pitch or key is how the brain interprets the
frequency of the sound created and the higher the frequency, or the faster the sound vibrations occur,
the higher the pitch. If the vibrations are slower, then the frequency is low.
Analog signals are continuous and flexible; able to change at varying rates and size which means
that analog audio is relatively unconstrained and unlimited. The flexible nature of analog sound, though
seemingly positive, is in fact its biggest disadvantage, as it is therefore more susceptible to the effects
of extreme changes in audio that cause degradation of sound like distortion and noise.
The analog signal is sampled, or 'measured' and assigned a numerical value, which the
computer can understand and store.
The number of times per second that the computer samples the analog signal is called
its Sample Rate or Sampling Frequency. While the basic unit used to measure
frequency or cycles per second is hertz, when sampling audio it is generally measured
in thousands of cycles per second or kilohertz (kHz).
An audio CD, for example, generally has a sampling rate of 44.1 kHz that is forty four
thousand one hundred Hertz (or samples per second), while the AM radio has a sample
rate of 11.025 kHz or eleven thousand and twenty five Hertz. The more samples taken,
the higher the quality of the digital audio signal produced.
Example:
H.Akrawi Play the two provided examples, 44100Hz.wav and
11025Hz.wav to hear the difference in sound
quality between the two different sample rates.
Low sampling rates, below 40 kHz, can result in a
static distortion caused by data loss. This is referred
to as Aliasing (see Figure 8). Aliasing can cause
digitally reconstructed sound to playback poorly. To
avoid the aliasing effect sampling needs to occur at
a high enough rate to ensure that the sound's
fidelity is maintained, or anti-aliasing needs to be
applied when the audio is being sampled. An anti-
alias filter can ensure that nothing above half the
desired sampling rate can enter the digital stream ie
any frequencies above the desired frequency are
blocked. Be aware that using anti-alias filters may,
in turn, introduce further unwanted noise.
Analog audio is a continuous sound wave that
develops and changes over time. After it is
converted to digital audio it is discontinuous, as it
is now made up of thousands of samples per second.
H.Akrawi Quantization and Bit Depth
Once an analog signal has been sampled, it is then assigned a numeric value in
a process called Quantization. The number of bits used per sample defines
the available number of values.
Bit is short for binary digit. Computers are based on a binary numbering
system that uses two numbers; 0 and 1. This differs from the more familiar
decimal numbering system that uses 10 numbers.
This two number system means each additional bit doubles the number of
values available - a 1-bit sample has 2 possible values; 0 and 1 and a 2-bit
sample has 4 possible values; 0 and 0, 1 and 0, 0 and 1, 1 and 1 and so on
(See Figure 9).
H.Akrawi
These binary values are defined as its Resolution or Bit Depth. This method of
measurement is used throughout digital technologies. You may already be familiar with
bit depth in digital graphics, where a 1-bit image is black and white, a web safe or
greyscale image is 8-bit and an RGB image has one byte or 8-bits allocated for each of
the three colors and is 24-bits in total.
Typically, audio recordings have a bit depth of either 8 or 16-bit and even 24-bit on
some systems. An 8-bit sample will allow 256 values, whereas a 16-bit sample will allow
65 536 values. The greater the bit depth, the more accurate the sound reproduction and
the better the sound quality.
An audio Dynamic Range is the difference between the lowest and highest points of a wave
and is measured in decibels (dB). The larger the dynamic range the greater the risk of
distorted sound. Audio files with a large dynamic range tend to require greater bit depth to
maintain sound quality.
An 8-bit sample, with 256 values can recreate a dynamic range of 48 dB (decibels), which is
equivalent to AM radio, whereas a 16-bit sample can recreate a dynamic range of
96 dB, which is the equivalent of CD audio quality.
H.Akrawi The dynamic range of the average human ear is approximately 0 to 96 dB (120dB is the
pain threshold), so it is no coincidence that the standard bit depth for CD quality audio is
16-bit.
H.Akrawi Bit-Rates
The number of bits used per second to represent an audio recording is defined as Bit-
Rate. In digital audio bit-rates are defined in thousands of bits per second(kbps).
The bit-rate is directly associated with a digital audio file’s size and sound quality. Lower
bit-rates produce smaller file sizes but inferior sound quality. Higher bit-rates produce
larger files but are of a better sound quality. An uncompressed audio track's bit-rate and
approximate file size can be calculated using the following formulas:
Audible Frequency refers to the range of frequencies that are detectable by the average
human ear. There is a direct correlation between the sample rate and the highest audible
frequency perceived by the ear. The relationship between sample rate and the highest
audible frequency is referred to as the Nyquist Theorem. The Nyquist Theorem, named
after Harry Nyquist, a Bell engineer who worked on the speed of telegraphs in the 1920s,
is a principle that is used to determine the correct sampling rate for a sound.
Essentially, the Nyquist Theorem states that a sound needs to be sampled at a rate that is
at least twice its highest frequency in order to maintain its fidelity or sound quality.
Therefore, a sample taken at 44.1kHz will contain twice the information of a sample taken
at 22,050 kHz. Put simply, this means that the highest audible frequency in a digital
sample will be exactly half the sampling frequency.
Average human hearing, at best, covers a range from 20 Hz (low) to 20 kHz (high), so a
sample rate of 44.1 kHz should theoretically cover most audio needs. It is also the
standard for CD audio, which requires near optimum sound quality. Therefore, the higher
the sample rate, the better the quality of sound that is reproduced. However, this also
means that the higher the sample rate, the greater amount of audio data produced and
consequently the larger the file size. This means that there is a direct correlation
between the sample rate, the quality of sound and the file size of the audio file.
H.Akrawi An example of how this affects the quality of digital audio is illustrated by the example
provided in Figure 10. A music track that has an optimum frequency of approximately
20 kHz, the highest audible frequency perceived by the average human ear, needs to be
sampled at 44.1 kHz in order to maintain CD quality sound fidelity. However, if the same
track is sampled at a rate lower than 44.1 kHz e.g. 30 kHz, then according to the
Nyquist Theorem, the range between 15 kHz and 20 kHz will be lost and therefore the
sound quality will deteriorate.
The reason for sampling below the recommended rate of the Nyquist theorem, would be
where the sample rate is determined by the transmission technology. For example,
telephone wires and the bandwidth allocated to radio transmission, where low data rates
and storage space are considered over sound quality.
Mono signals, on the other hand, have identical sounds in each speaker and this creates a
more unnatural sound - ‘flat’ sound. This is a major consideration when digitising audio,
in that it will take twice as much space to store a stereo signal compared to mono signal.
H.Akrawi Digital Audio Formats
An audio file consists of two main components; a header and the audio data. The header
stores information in relation to Resolution, Sampling Rate and Compression Type.
Sometimes, a wrapper is also used which adds information about things such as license
management information or streaming capabilities (see Figure 12).
Digital audio files can be found in a huge variety of file formats but basically these files
can be divided into two main categories:
1. Self–Describing
2. RAW
H.Akrawi Self-Describing formats are usually recognized by their file extension. The extension,
which is part of the file name, will refer to the type and structure of the audio data within
the file and it instructs the user and the computer in relation to how to deal with the
sound information.
RAW formats are files that are not compressed. They rely on the sound software to
correctly interpret the sound file by reading the data or code of the header component.
File formats are used for different purposes and they vary in terms of file sizes created.
Therefore, when choosing an audio file format, its function and eventual context need to
be considered. This is particularly important when working with audio files for the web.
H.Akrawi Common Audio File Formats
Real Audio is a good choice for longer audio clip sounds because it lets you listen to them
in ‘real-time’ from your Web browser and the sound quality of the high bandwidth
compressions is good. Real Audio players can be included with a web browser or can be
downloaded from the web.
H.Akrawi 5. MIDI – Music Instrument Digital Interface
MIDI, or Musical Instrument Digital Interface, is not an actual audio file format but
rather a music definition language and communications code that contains instructions to
perform particular commands. Rather than representing musical sound directly, MIDI files
transmit information about how music is produced. MIDI is a serial data language,
composed of MIDI messages, often called events, that transmit information about pitch,
volume and note duration to MIDI-Compatible sound cards and synthesizers.
Messages transmitted include:
• Start playing (Note ON)
• Stop playing (Note OFF)
• Patch change (eg change to instrument #25 - nylon string guitar)
• Controller change (eg change controller Volume to value from 0 to 127)
It was initially developed to allow sequencers to control synthesisers. Older
synthesisers were Monophonic, that is, they were only able to play one note at a time.
H.Akrawi Sequencers could control those synthesisers by voltage and a trigger or gate signal that
told you if a key was up or down. Contemporary synthesisers are Polyphonic, enabling
them to play many notes at once, which is more complex. A single voltage was not
enough to define several keys so the only solution was to develop a special language; the
Midi. It has much smaller file sizes than other audio file formats, as it only contains player
information and not the actual direct sound. The positives of the MIDI are its small file size
but the disadvantage is the lack of direct sound control.
These are the most common audio file formats in the current market but in the past,
computers that had sound capabilities developed their own proprietary file formats.
Any compressor will achieve varied ratios of compression depending on the amount and
type of information to be compressed and there are many different file formats available
for both Lossless and Lossy audio compression.The web is the most obvious location
where audio compression becomes of paramount importance. Speed and efficiency are
the two things that the web relies on in terms of effective data transfer from the Internet
pipeline to the end user’s machine. Therefore, the smaller the file size the faster the data
is transferred.
There are several ways you can reduce the size of an audio file for delivery on the
web. The first and most obvious method would be to consider the length of the track.
H.Akrawi
There will be a significant difference for example between 1 minute of recorded audio, as
opposed to 40 seconds (see Figure 14). The next consideration would be the number of
channels; does the track need to be in stereo or could it be converted to a mono
recording. By converting the file to only one channel you have already effectively reduced
the file to a half of its original size and a half of the download time.
Another way to reduce the file size is to change the bit depth from a 16-bit track, for
example to an 8-bit track. The final way to reduce the size of an audio file is to alter the
sample rate. The key in creating digital audio files for the web is to experiment with the
various recording settings, in order to find an effective balance between sound quality,
performance and file size (See Figure 14).
H.Akrawi Hardware Considerations
1. Video Capture Cards
A video capture card is used together with a computer to pass frames from the video to
the processor and hard disk. When capturing video, ensure that all programs not in use
are closed, as video capture is one of the most system intensive tasks that can be
performed on a computer.
Most capture cards include options of recording with a microphone or line level
signal. A Microphone Level Signal is a signal, which has not been amplified and has a
voltage of .001 (one millivolt). Not surprisingly, microphones usually generate
microphone level signals. A Line Level Signal is a preamplifier and has a voltage of 1.0
(one full volt) generally created by mixing decks, *Video Tape Recorders (VTR), tape
players and DAT players etc. If your capture card has the option, you will be able to
decide which type of signal you are recording. Your capture card may have two different
types of connectors. The microphone input is usually (except when using Macintosh
system microphones) a 3.5 mini jack stereo connector. The line input is usually a
stereo RCA connector or some times three-pin XLR connector.
* Video Tape Recorders VTRs are professional recording and playback machines which use magnetic tape rolls.
H.Akrawi 2. Metering and Monitoring
Your capturing software should also allow you to see a graphic representation of sound
levels – it should display meters. There are different types of meters, which use a
variety of measurements and color codes. Regardless of metering systems used, you
should always use the meter to ensure that the incoming sound does not exceed the
recording abilities of the capture card. Unlike analog systems, which due to the
electrical nature of the signal and the recording medium, allow for sounds to be recorded
at levels that clip or peak, digital systems don’t allow for this. Digital recorders can
only record levels within their range capabilities. If the incoming level exceeds the
maximum level, clipping (distortion) will occur. The result of this is distortion of the
digital sound when played back.
3. Sound Cards and Sound Considerations
H.Akrawi A sound card is a peripheral device that attaches to the motherboard in the
computer. This enables the computer to input, process and deliver sound. Sound cards
may be connected to a number of other peripheral devices such as:
• Headphones
• Amplified speakers
• An analog input source (microphone, CD player)
• A digital input source (DAT, CD-ROM drive)
• An analog output device (tape deck)
• A digital output device (DAT, CD recordable CD-R) (see Figure 15)
The core of the sound card is the audio processor chip and the CODECs. In this
context, CODEC is an acronym for COder/DECoder. The audio processor manipulates the
H.Akrawi digital sound and depending on its capabilities, is responsible for converting sample rates
between different sound sources or adding sound effects. Although the audio processors
deal with the digital domain, at some point, unless you have speakers with a digital input,
you will need to convert the sound back into analog.
Similarly, many of the sound sources that you want to input to your computer will begin
as analog and therefore need to be converted into digital. A sound card therefore needs
some way to convert the audio. DACs (digital to analog converters) and ADCs (analog to
digital converters) are required to convert these audio types and many audio cards have
chips that perform both of these functions. They are also known as CODECs due to their
capability to encode analog to digital and decode digital to analog.
The other factors that can influence the functionality and usability of the sound card is
the Disk Driver, along with the number and type of input and output connectors (see
Figure 16).
H.Akrawi 4. DAT Recording
DAT (Digital Audio Tape) is used for recording audio on to tape at a professional level
of quality. A DAT drive is a digital tape recorder with rotating heads similar to those found
in a video deck (see Figure 17). Most DAT drives can record at sample rates of 44.1
kHz, the CD audio standard and 48 kHz.
Recording on DAT is fast and simple. It is as simple as choosing what you want, setting
the levels and pressing record. DAT has become the standard archiving technology in
recording environments for master recordings. Digital inputs and outputs on professional
DAT decks allow the user to transfer recordings from the DAT tape to an audio
workstation for precise editing. The compact size and low cost of the DAT medium makes
it an excellent way to compile the recordings that are going to be used to create a CD
master.
5. Mini Disk Players
H.Akrawi MiniDisc was developed by Sony in the mid eighties as portable equipment that combine
the storage qualities of CD with the recordabilty of cassettes. They are very cost effective
and run on power or on re-chargeable batteries, which last for approximately 14 hours of
play time.
While CD-ROMs and DVDs use optical technology and floppys and hard drives use
magnetic technology MiniDisc uses a combination of both to record data. Therefore care
should be taken to protect minidisks from strong magnetic fields. Just like a computer’s
hard drive, the audio data is recorded in digitally and in fragments - this is called Non-
Linear recording.
MiniDisc’s use sample rates of 48Khz, 44.1Khz or 36Khz. They uses compression to
enable them to record the equivalent to a full sized CD on to the 64mm disc. This
compression is called ATRAC (Adaptive Transform Acoustic Coding) incorporates noise
reduction and has a compression ratio of 1:5. Similar to MP3 it reduces data by only
encoding only frequencies audible to the human ear.
M.Y.A 6. Microphones
Computers that have built in microphones are not usually considered to be high-fidelity
devices. When dealing with audio production, the adage ‘garbage in garbage out’ applies.
In essence, nothing can fix poorly recorded sound. If your audio is going to be
compressed, or its sample rate and bit depth are reduced, then it is very important to
record clear, dynamic sounds. Choosing a good microphone is very important. There are a
variety of microphones available on the market, each offering different sound qualities
that are outlined in the following section, but firstly, let’s discuss how microphones work.