 |
| The
recording of sound can be divided into two critcal areas:
Techniques and equpiment. In the techniques section, recording
environment, microphone placement, and signal processing are
covered. In the equipment section, microphones, pre-amplifiers,
and recording devices and media are discussed.
|
 |
Recording
Environment - Much
of the success of a speech recording depends on the recording
environment and microphone placement. Ideally, speech recordings
should take place in soundproof studios or labs. If those
are not available, one should try to find a relatively quiet
room with as little low-frequency noise as possible. Most
typical sources of low-frequency noise include 60-Hz hum from
electrical equipment, heating and air-conditioning ducts,
elevators, doors, water pipes, computer fans, and other mechanical
systems in the building. If possible, those devices should
be switched off during recording. The figure below illustrates
the spectrum of a typical ambient room noise. The low-frequency
prominence of this kind of noise may interfere with acoustic
analysis of the fundamental frequency and the first formant.
It is also quite difficult to filter out any such extraneous
noise without removing information from the speech signal
itself. High frequency noise should also be avoided, though
it can be more easily filtered out of the signal before analysis. |

Spectrum
of typical room noise. Note the prominence at around 600 Hz. |
Microphone
Placement - The placement of the microphone
directly affects the intensity of the recorded signal as well
as the signal-to-noise ratio. The inverse square law guarantees
a loss of approximately 6 dB per doubling of distance from
the sound source. A typical handheld microphone is usually
placed at a distance of 30 cm or so from the talker’s
lips. This, relative, to a close placement (say, 4 cm) represents
the loss of about 18 dB and an increased possibility of noise
leaking into the recording. For this reason, it is recommended
to use a head-mounted condenser microphone (such as AKG C410
or AKG C420) to maintain a close, and constant distance to
the source. Speech signals acquired this way are characterized
by a high SNR and a broad range of intensity. It may also
be useful to use a linear phase high-pass rumble filter (60
Hz cutoff and 24dB/octave attenuation), unless low-frequency
components are expected in the signal. |
|
Signal Processing and Special Effects
- Long-term preservation preservation
is one of the primary goals of oral history recordings. It
is, therefore crucial to try to obtain the cleanest, highest-fidelity
signal right at the very first stage of recording. One should
try to capture a wide dynamic range (e.g., 96 db in 16-bit
digital recordings) and a fairly wide frequency response (e.g.,
0-20,000 Hz). It is also recommended to avoid using too many
special effects at this stage, as well. The only exception
may be a soft limiter to reduce the possibility of accidental
signal overloading. Such "clean" recordings should
be stored for preservation purposes. However, various audio
delivery situations, such as radio broadcasts, web streaming,
CD, etc. may require extra signal processing to make the audio
sound better. Digital signal processing (DSP) effects are,
therefore, best applied in the post-production process.
There
are 4 basic types of effects (filters) that can be applied
to speech recordings:
Equalization
(EQ) - Equalization is selective
amplification, or reduction, of a signal based on frequency.
Audio signals consist of combinations of fundamental signals
and their harmonics. Changes to the spectral balance of a
signal involves altering the relationship of the fundamental
to its harmonics. Each harmonic makes up one aspect of the
audible character of a signal. Knowing these relationships
allows you to quickly zero-in on the correct frequency range
of the signal and apply boost or cut to enhance or correct
what you are hearing.
Compressor
- The effect of a compressor is to make loud parts of a signal
softer and to make very soft parts louder. Compression works
particularly well in radio broadcasting and streaming audio
where very soft passages may be lost due to the background
noise in the listening environment and the data compression
algorithm used in the streaming audio format.
De-esser
- The de-esser is a dynamic range controller specially designed
to regulate high frequency content. The de-essing technique
was developed for motion picture dialogue recording. Speech
sounded more natural and pleasing with the reduction of sibilants
(in words such as: church, test, zest, etc.). By sensing and
limiting certain selected frequencies, the de-esser provides
specific control over some of the higher frequency vocal sounds
which may become overemphasized when the speaker or vocalist
is close to the microphone.
Reverb
- Reverb (reverberation) is an effect that simulates natural
room reverberation. It can be defined as the remainder of
sound that exists after the sound source is stopped. The time
of reverberation as defined as the time it takes for sound
pressure level to decay to one-millionth of its former value.
For good speech intelligibility, too much reverberation is
a hindrance, and can be considered noise, although some reverberation
may add a bit of character and presence to one's voice. |
 |
Microphones
- Two basic microphone types are most typically
used for recording speech. The dynamic microphone has a diaphragm
that consists of Mylar plastic that has a finely wrapped coil
of wire (so-called “voice coil”) attached to its
inner face. This coil is suspended within a strong magnetic
field. Whenever a sound wave hits the diaphragm, the coil
is displaced in proportion to the amplitude of the wave, causing
the coil to cut across the lines of magnetic flux supplied
by the permanent magnet. Since the mass of the diaphragm and
the coil is quite large, compared to the pressure changes
in the sound wave, the dynamic microphone may not respond
well to sharp transient sounds and it may fail to record minute
changes in voice intensity. This does not mean that dynamic
microphones should not be used for speech recording. On the
contrary, there are several low-cost, rugged microphones,
such as Shure SM58 or Shure SM48 (once recommended by Kay
Elemetrics), that can be used successfully for many speech
recording applications. |
|
|

Marantz
PMD222 analog cassette field recorder. http://www.marantz.com
|
Dynamic
microphones do not require any external power supply, which
makes them a good match to several field recorders (e.g.,
Marantz PMD 222). The condenser microphone works on a different
principle. A thin plastic diaphragm coated on one side with
gold or nickel is placed at a close distance from a stationary
backplate. Once a polarizing voltage (from a 48 V phantom
power supply) is applied to these plates, the two surfaces
create capacitance that varies as the diaphragm moves in response
to a sound wave. Since the diaphragm is very light, the response
of the condenser microphone is very accurate, often producing
a recording that is extremely rich in both frequency and dynamic
response. |
Condenser
microphones are usually more fragile than dynamic microphones
and require a 48 V phantom power supply. Some of them can
use battery packs, but some rely on an external power source.
This makes the condensern microphone a bit more cumbersome
to use in the field, though several DAT and HDD recorders
have an on-board phantom power supply. |
| Frequency
Response - Each microphone type has
a unique frequency response. It is important to remember that
many manufacturers tailor a microphone’s frequency response
to accentuate particular parts of the spectrum to function
optimally within a specific recording application. For the
purposes of acoustic analysis, one should always choose microphones
with a wide and flat frequency response curve.
|

Frequency
response of a Shure Beta 87a microphone. Note the effect of
roll-off filter at 10 mm.
|
| 
Cardioid
polar patter of a Shure Beta 87a microphone |
Polar
Patterns - The microphone’s
polar pattern should play a crucial role in choosing a microphone
for a specific recording application. The polar pattern is
a plot of the sensitivity of a microphone as a function of
the angle around that device. There are several common polar
pattern types used in microphones today. The omnidirectional
microphone records sound equally from all directions. Such
microphones are most commonly used as built-in or lavalier
types. They seem to be very good for recording interviews,
though their 360-degree pick-up range introduces too much
noise to the signal for it to be used reliably in acoustic
analysis.
The
cardioid (heart-shaped) pattern is most sensitive to sounds
coming from the front. It is 6 dB less sensitive to sounds
from 90 degrees to the sides, and, in theory, is completely
insensitive to sounds coming from the rear. The most important
attribute of a cardioid (directional) microphone is its ability
to discriminate between direct sounds (coming from the direction
in which it is pointed) and reverberant, unwanted sounds from
all other directions. This type of polar pattern usually produces
signals that are substantially less noisy than those captured
with an omnidirectional microphone.- |
| Proximity
Effect - Usually, high-quality speech
recordings require the sound source to be fairly close to
the microphone’s diaphragm. This may trigger a so-called
proximity effect. Proximity effect is the increase in the
low-frequency sensitivity of a microphone when the sound source
is close to it. This is particularly true of cardioid, directional
microphones. To counter that, most high-end directional microphones
use a low-frequency roll-off filter to restore the response
to its flat, natural balance. Some microphones have a user-selectable
switch to control the filter. The proximity effect may be
responsible for speech spectra showing emphasis in the low-frequency
range, around the first and second harmonics.
|
| Cabling
and Phantom Power - Cabling
is a common cause of recording problems. In order to avoid
noise (60 Hz hum) and phase problems, it is recommended to
use professional quality balanced XLR (two conductors for
the signal with neither connected to the shield) cables to
connect the microphone to the pre-amplifier. If the pre-amplifier
does not have balanced XLR inputs, one should use a balanced
to unbalanced transformer. The transformer’s primary
side matches the impedance of the microphone and is balanced,
while the secondary side is unbalanced and has high impedance
that matches most unbalanced pre-amplifier inputs. Condenser
microphones require a 48 V phantom power. While some microphones,
such as AKG C1000 can work from an internal battery source,
others require an external phantom power supply, which many
good pre-amplifiers and mixing consoles come equipped with.
|
Pre-Amplifiers
- The main function of a pre-amplifier is to
accept a very low-level signal (such as that from a microphone)
and amplify it without adding noise. Good pre-amplifiers are
not easy to build, as they have to be immune to all kinds
of potential noise and signal distortion. It is, therefore,
important to use the best possible pre-amplifier, particularly
one that has a fairly high gain, broad dynamic range, high
SNR, phantom power, and balanced XLR inputs. Good field pre-amplifiers
are particularly hard to find. Sound Devices MP2, USB Pre,
and Shure FP 23 are among some of the best, yet affordable
field pre-amplifiers. The gain on the pre-amplifier should
be set relatively high, but the speaker’s voice amplitude
range should be tested first to avoid signal overload. If
intensity measurements are not indented, a soft compressor-limiter
may be used to maximize amplitude and help prevent signal
clipping.
|
|
|
|