 |
The processing
of audio for the web is not nearly as complicated as you
may have been led to believe. In this section of The MATRIX
Sound Resource, the process has been broken down into its
central components and each component is then explained
in detail.
|
| Analog
to Digital Conversion and Digital Audio Signal Transfer
- In order to perform acoustic analysis on
recorded speech data or to deliver audio on-line, the audio
signal has to be converted into a digital audio file format,
such as Wav or Aiff. Analog recordings have to be digitized
and digital recordings need to be transferred to a personal
computer via a digital audio file transfer interface. This
is an important, yet often underestimated, stage in the process
of preparing audio data for analysis.
The
main goal of A/D conversion (digitization) is to obtain the
best possible digital representation of the original analog
waveform. Without going into too much technical detail of
the digitization process, one should choose a sample rate
that will capture a broad range of frequencies and a bit-depth
that will allow a wide dynamic range and a negligible amount
of quantization noise. These goals can be achieved by means
of a premium-quality, stand-alone A/D converter operating
at the sample rate of at least 48,000 Hz and a 24-bit resolution.
It is absolutely crucial not to use a PCI multimedia sound
card, as they are built from inferior-quality electronic components
and, more importantly, allow electrostatic noise and distortion
to leak into the captured acoustic signal:

Spectrum
of typical electrostatic noise generated by computer circuitry.
The
A/D converter, such as Lucid AD 9624, should offer a variety
of sample rates, oversampling, high quality anti-aliasing
filters, and AES/EBU and S/PDIF digital outputs. Both AES/EBU
(Audio Engineering Society/European Broadcasting Union) and
S/PDIF (Sony/Philips Digital Interface) are fairly common
on high-end digital audio devices. In addition, S/PDIF is
used on a variety of consumer-level products, such as CD players,
minidisk players, etc. It is also a common interface used
on PCI digital I/O cards, which is why it is probably a better
choice for most digital audio transfer applications.
The
analog playback device (such as TASCAM 122 mkIII) should be
connected to the A/D converter. One should make sure that
the output levels on the tape deck match the input levels
on the A/D converter. It is recommended to use balanced XLR
line level interface (+24 dBu min. gain, +7 dBu max. gain,
65k ohm impedance). If the tape deck does not have this kind
of output interface, a signal level transformer (such as Ebtech
Line shifter PHOTO>>) and a pre-amplifier should be
used.
The
A/D converter needs to be connected to a PCI (though USB and
FireWire are becoming common) digital audio I/O card (such
as Midiman Delta DiO 2496 via a S/PDIF interface). The digital
I/O card should be selected as the recording interface in
the audio recording software (such as Sonic Foundry Sound
Forge 5.0 on a PC or BIAS Peak VST on a Mac). The digital
audio signal should be captured with this software and saved
either as Wav (PC) or Aiff (Mac) file at the sample rate and
bit depth that the A/D converter was set to. It is also possible
to capture digital audio signal directly into acoustic analysis
software, such as CSL or Praat, though it is not recommended
due to the fact that specialized recording and processing
software offers considerable more control over the incoming
signal. It should also be mentioned that USB Pre may be used
as a high-quality, stand-alone A/D converter.
In
this case the digital audio signal is transferred to a PC
via the USB interface, which eliminates the need to install
a separate PCI digital I/O card and makes it possible to capture
digital audio on a laptop. In addition, USB Pre has a pair
of tape-level inputs, to which a cassette deck can be directly
connected. |
| Improving
Audio Digitization - There are a few
simple, yet important ways in which the quality of the digitial
representation of an analog waveform can be improved.
1. Use a sample rate of 96,000 Hz
- In principle, if frequency response were the only issue,
there would be no advantage in moving to formats with higher
sampling rates. However, the evidence is otherwise. Direct
psychoacoustic comparisons of the same source material, recorded
and reproduced at 44.1 kS/s, 96 kS/s 192 kS/s show that there
is an advantage in going to the higher rates - it sounds better!
The most common comment is that such recordings have “better
spatial resolution”. What mechanism can be at work?
It seems unlikely that we have all suddenly developed ultrasonic
hearing capabilities.
Energy dispersion and anti-alias filtering
- Sharp filtering inevitably causes a ringing transient response
- the effect is referred to as the Gibbs phenomenon. The ringing
contains energy, and although the energy in the input transient
is concentrated at one time, the energy from the anti-alias
filter is spread over a much longer time - the audio picture
is "defocused”. We might argue that the energy
is ultrasonic, but this is certainly not the case at 44.1
or 48 kS/s - our bandwidth constraints mean that to get good
anti-aliasing, we must filter as fast as we can, and only
pass the audio bandwidth. A high sample rate gives us the
extra bandwidth to contain the ringing (energy defocusing).
The audio DVD standard - In addition
to improved anti-aliasing and energy defocusing handling,
the 96,000 Hz sample rate is part of the new, emerging digital
audio standard, used in present-day recording studios, consumer
PCs (e.g., the new Sound Blaster Audigy cards), and the audio
DVD format.
2. Use 24-bit quantization
- For the sampling theorem to apply exactly, each sampled
amplitude value must exactly equal the true signal amplitude
at the sampling instant. Real ADCs do not achieve this level
of perfection. Normally, a fixed number of bits (binary digits)
is used to represent a sample value. Therefore, the infinite
set of values possible in the analog signal is not available
for the samples. In fact, if there are R bits in each sample,
exactly 2R sample values are possible. For high-fidelity applications,
such as archival copies of analog recordings, 24 bits per
sample or a so-called 24-bit resolution, should be used. The
difference between the analog signal and the closest sample
value is known as quantization error. Since it can be regarded
as noise added to an otherwise perfect sample value, it is
also often called quantization noise. The effect of quantization
noise is to limit the precision with which a real sampled
signal can represent the original analog signal. This inherent
limitation of the ADC process is often expressed as a Signal-to-Noise
ratio (SNR), the ratio of the average power in the analog
signal to the average power in the quantization noise. In
terms of the dB scale, the quantization SNR for uniformly
spaced sample levels increases by about 6 dB for each bit
used in the sample. For ADCs using R bits per sample and uniformly
spaced quantization levels, SNR = 6R - 5 (approximately).
Thus, for 16-bit encoding about 91 dB is possible. It is 20
to 30 dB better than the 60 dB to 70 dB that can be achieved
in analog audio cassette players using special noise reduction
techniques. A 24-bit encoding yields a theoretical SNR of
138 dB, which is only limited by the electronics of the hardware
itself.
3. Use appropriate anti-aliasing
filters - Simply put, aliasing is a kind of sampling
confusion that can occur during the digitization process.
It is a direct consequence of violating the sampling theorem.
The highest frequency in a sampling system must not be higher
than the Nyquist frequency. With higher audio frequencies,
the sampler continues to produce samples above Nyquist at
a fixed rate, but the samples will create false information
in the form of alias frequencies. In practice, aliasing can
and should be overcome. The solution is rather straightforward.
The input signal must be band-limited with a low-pass (anti-aliasing)
filter that provides significant attenuation at the Nyquist
frequency. The most "archetypal" anti-aliasing filter
will have "brick-wall" characteristics with instantaneous
attenuation and a very steep slope. This results in unwanted
ringing-type effects and should be avoided. In practice, our
system should use an oversampling (see below) A/D converter
with a mild low-pass filter, high initial sampling frequency,
and decimation processing to prevent output sampling frequency.
4. Dither - Dither is a
small amount of noise added to the audio signal before sampling.
This causes the audio signal to shift with respect to quantization
levels. Quantization error is thus decorelated from the signal
and the effects of the quantization error become negligible.
Dither does not prevent the quantization error; instead, it
allows the system to encode amplitudes smaller than the least
significant bit.
5. Oversampling - Oversampling
is another technique aimed at improving the results of the
digitization process. As noted above, a brick-wall filter
may produce unwanted acoustic effects. In oversampling A/D
conversion, the input signal is first passed through a mild
low-pass filter, which provides sufficient attenuation at
high frequencies. To extend the Nyquist frequency, the signal
is then sampled at a high frequency and quantized. Afterwards,
a digital low-pass filter is used to reduce the sampling frequency
and prevent aliasing when the output of the digital filter
(e.g. an interpolating, phase linear "FIR" filters)
downsampled to achieve the desired output sampling frequency
(e.g., 44,100 Hz). In addition to eliminating unwanted effects
of a brick-wall analog filter, oversampling helps achieve
increased resolution by extending the spectrum of the quantization
error far beyond the audio base-band, rendering the in-band
noise relatively insignificant.
6. Use high-quality, no-compromise hardware and software.
|
Audio
digitization workflow:

-
Prepare
the analog original (use appropriate techniques, such
as tone callibration, restoration, baking, etc.)
-
Route
the signal through a mixing board AUX send/return or Channel
insert/return into a compressor/limiter - only for "soft"
tapes with significant amplitude variability.
-
Send
the signal to a stand-alone A/D converter
-
Send
the digital audio data stream to a PCI capture card via
AES/EBU or S/PDIF
-
Make
2 CD-ROM (ISO standard) copies of the master file
- Initiate
script to process and move the "working" audio
files into the digital repository.
The batch
script may inlude:
-
downsampling
to 22,050 Hz
-
changing
the resolution to 16 bit with dither and noise shaping
-
2:1
compression at -18 dB
-
noise
reduction
-
RealAudio
encoding
|
|
|