The processing of audio for the web is not nearly as complicated as you may have been led to believe. In this section of The MATRIX Sound Resource, the process has been broken down into its central components and each component is then explained in detail.

Analog to Digital Conversion and Digital Audio Signal Transfer - In order to perform acoustic analysis on recorded speech data or to deliver audio on-line, the audio signal has to be converted into a digital audio file format, such as Wav or Aiff. Analog recordings have to be digitized and digital recordings need to be transferred to a personal computer via a digital audio file transfer interface. This is an important, yet often underestimated, stage in the process of preparing audio data for analysis.

The main goal of A/D conversion (digitization) is to obtain the best possible digital representation of the original analog waveform. Without going into too much technical detail of the digitization process, one should choose a sample rate that will capture a broad range of frequencies and a bit-depth that will allow a wide dynamic range and a negligible amount of quantization noise. These goals can be achieved by means of a premium-quality, stand-alone A/D converter operating at the sample rate of at least 48,000 Hz and a 24-bit resolution. It is absolutely crucial not to use a PCI multimedia sound card, as they are built from inferior-quality electronic components and, more importantly, allow electrostatic noise and distortion to leak into the captured acoustic signal:

Spectrum of typical electrostatic noise generated by computer circuitry.

The A/D converter, such as Lucid AD 9624, should offer a variety of sample rates, oversampling, high quality anti-aliasing filters, and AES/EBU and S/PDIF digital outputs. Both AES/EBU (Audio Engineering Society/European Broadcasting Union) and S/PDIF (Sony/Philips Digital Interface) are fairly common on high-end digital audio devices. In addition, S/PDIF is used on a variety of consumer-level products, such as CD players, minidisk players, etc. It is also a common interface used on PCI digital I/O cards, which is why it is probably a better choice for most digital audio transfer applications.

The analog playback device (such as TASCAM 122 mkIII) should be connected to the A/D converter. One should make sure that the output levels on the tape deck match the input levels on the A/D converter. It is recommended to use balanced XLR line level interface (+24 dBu min. gain, +7 dBu max. gain, 65k ohm impedance). If the tape deck does not have this kind of output interface, a signal level transformer (such as Ebtech Line shifter PHOTO>>) and a pre-amplifier should be used.

The A/D converter needs to be connected to a PCI (though USB and FireWire are becoming common) digital audio I/O card (such as Midiman Delta DiO 2496 via a S/PDIF interface). The digital I/O card should be selected as the recording interface in the audio recording software (such as Sonic Foundry Sound Forge 5.0 on a PC or BIAS Peak VST on a Mac). The digital audio signal should be captured with this software and saved either as Wav (PC) or Aiff (Mac) file at the sample rate and bit depth that the A/D converter was set to. It is also possible to capture digital audio signal directly into acoustic analysis software, such as CSL or Praat, though it is not recommended due to the fact that specialized recording and processing software offers considerable more control over the incoming signal. It should also be mentioned that USB Pre may be used as a high-quality, stand-alone A/D converter.

In this case the digital audio signal is transferred to a PC via the USB interface, which eliminates the need to install a separate PCI digital I/O card and makes it possible to capture digital audio on a laptop. In addition, USB Pre has a pair of tape-level inputs, to which a cassette deck can be directly connected.

Improving Audio Digitization - There are a few simple, yet important ways in which the quality of the digitial representation of an analog waveform can be improved.

1. Use a sample rate of 96,000 Hz - In principle, if frequency response were the only issue, there would be no advantage in moving to formats with higher sampling rates. However, the evidence is otherwise. Direct psychoacoustic comparisons of the same source material, recorded and reproduced at 44.1 kS/s, 96 kS/s 192 kS/s show that there is an advantage in going to the higher rates - it sounds better! The most common comment is that such recordings have “better spatial resolution”. What mechanism can be at work? It seems unlikely that we have all suddenly developed ultrasonic hearing capabilities.

Energy dispersion and anti-alias filtering - Sharp filtering inevitably causes a ringing transient response - the effect is referred to as the Gibbs phenomenon. The ringing contains energy, and although the energy in the input transient is concentrated at one time, the energy from the anti-alias filter is spread over a much longer time - the audio picture is "defocused”. We might argue that the energy is ultrasonic, but this is certainly not the case at 44.1 or 48 kS/s - our bandwidth constraints mean that to get good anti-aliasing, we must filter as fast as we can, and only pass the audio bandwidth. A high sample rate gives us the extra bandwidth to contain the ringing (energy defocusing).

The audio DVD standard - In addition to improved anti-aliasing and energy defocusing handling, the 96,000 Hz sample rate is part of the new, emerging digital audio standard, used in present-day recording studios, consumer PCs (e.g., the new Sound Blaster Audigy cards), and the audio DVD format.

2. Use 24-bit quantization - For the sampling theorem to apply exactly, each sampled amplitude value must exactly equal the true signal amplitude at the sampling instant. Real ADCs do not achieve this level of perfection. Normally, a fixed number of bits (binary digits) is used to represent a sample value. Therefore, the infinite set of values possible in the analog signal is not available for the samples. In fact, if there are R bits in each sample, exactly 2R sample values are possible. For high-fidelity applications, such as archival copies of analog recordings, 24 bits per sample or a so-called 24-bit resolution, should be used. The difference between the analog signal and the closest sample value is known as quantization error. Since it can be regarded as noise added to an otherwise perfect sample value, it is also often called quantization noise. The effect of quantization noise is to limit the precision with which a real sampled signal can represent the original analog signal. This inherent limitation of the ADC process is often expressed as a Signal-to-Noise ratio (SNR), the ratio of the average power in the analog signal to the average power in the quantization noise. In terms of the dB scale, the quantization SNR for uniformly spaced sample levels increases by about 6 dB for each bit used in the sample. For ADCs using R bits per sample and uniformly spaced quantization levels, SNR = 6R - 5 (approximately). Thus, for 16-bit encoding about 91 dB is possible. It is 20 to 30 dB better than the 60 dB to 70 dB that can be achieved in analog audio cassette players using special noise reduction techniques. A 24-bit encoding yields a theoretical SNR of 138 dB, which is only limited by the electronics of the hardware itself.

3. Use appropriate anti-aliasing filters - Simply put, aliasing is a kind of sampling confusion that can occur during the digitization process. It is a direct consequence of violating the sampling theorem. The highest frequency in a sampling system must not be higher than the Nyquist frequency. With higher audio frequencies, the sampler continues to produce samples above Nyquist at a fixed rate, but the samples will create false information in the form of alias frequencies. In practice, aliasing can and should be overcome. The solution is rather straightforward. The input signal must be band-limited with a low-pass (anti-aliasing) filter that provides significant attenuation at the Nyquist frequency. The most "archetypal" anti-aliasing filter will have "brick-wall" characteristics with instantaneous attenuation and a very steep slope. This results in unwanted ringing-type effects and should be avoided. In practice, our system should use an oversampling (see below) A/D converter with a mild low-pass filter, high initial sampling frequency, and decimation processing to prevent output sampling frequency.

4. Dither - Dither is a small amount of noise added to the audio signal before sampling. This causes the audio signal to shift with respect to quantization levels. Quantization error is thus decorelated from the signal and the effects of the quantization error become negligible. Dither does not prevent the quantization error; instead, it allows the system to encode amplitudes smaller than the least significant bit.

5. Oversampling - Oversampling is another technique aimed at improving the results of the digitization process. As noted above, a brick-wall filter may produce unwanted acoustic effects. In oversampling A/D conversion, the input signal is first passed through a mild low-pass filter, which provides sufficient attenuation at high frequencies. To extend the Nyquist frequency, the signal is then sampled at a high frequency and quantized. Afterwards, a digital low-pass filter is used to reduce the sampling frequency and prevent aliasing when the output of the digital filter (e.g. an interpolating, phase linear "FIR" filters) downsampled to achieve the desired output sampling frequency (e.g., 44,100 Hz). In addition to eliminating unwanted effects of a brick-wall analog filter, oversampling helps achieve increased resolution by extending the spectrum of the quantization error far beyond the audio base-band, rendering the in-band noise relatively insignificant.

6. Use high-quality, no-compromise hardware and software.

Audio digitization workflow:

  1. Prepare the analog original (use appropriate techniques, such as tone callibration, restoration, baking, etc.)
  2. Route the signal through a mixing board AUX send/return or Channel insert/return into a compressor/limiter - only for "soft" tapes with significant amplitude variability.
  3. Send the signal to a stand-alone A/D converter
  4. Send the digital audio data stream to a PCI capture card via AES/EBU or S/PDIF
  5. Make 2 CD-ROM (ISO standard) copies of the master file
  6. Initiate script to process and move the "working" audio files into the digital repository.

The batch script may inlude:

  • downsampling to 22,050 Hz
  • changing the resolution to 16 bit with dither and noise shaping
  • 2:1 compression at -18 dB
  • noise reduction
  • RealAudio encoding

Introduction | Web Sound Basics | Recording | Processing
Analysis and Delivery | FAQ | Site Map | MATRIX