48,000 Hz or 96,000 Hz /24-bit PCM audio files are not suitable for acoustic analysis or delivery over the Internet. They should be stored as master, preservation copies of the recordings. Such files should be described in commom metadata terms (such as Dublin Core, METS or OLAC) and stored on a dependable optical storage medium, such as CD-R or DVD. The best storage formats are uncompressed, PCM digital audio files, such as Microsoft WAV, The Broadcast Wav Format, or AIFF.
Preparing Files for Analysis and Delivery - 96,000 Hz or 48,000 Hz/24-bit PCM audio files are not suitable for acoustic analysis or online delivery. In order to prepare files for acoustic analysis, several digital signal processing (DSP) techniques should be used. First, the files have to be downsampled to 11,025 Hz and their resolution should be changed to 16 bits. The downsampling process should always include the use of anti-aliasing filters, and the resolution change should include dithering to minimize the effects of quantization noise. Both Sonic Foundry Sound Forge 5.0 and BIAS Peak VST offer adequate DSP tools to do that. The Nyquist theorem guarantees that the frequency response of a 11,025 Hz is exactly half that value. This is adequate to analyze most speech sounds. If we are dealing with female and child voices, a higher sample rate (e.g., 22,000 Hz) may be used to make sure that the spectrum contains higher frequency transient sounds.
Digital Restoration - Often, linguists, oral historians, anthropologists, and educators find themselves dealing with old, noisy recordings. The digitization process itself does not improve the quality of the recording, nor does it remove any of its imperfections. However, there are a few simple DSP techniques that can be used to clean up and enhance the recording. The example below shows how several DSP techniques have effectively removed some of the unwanted noise from an old DARE recording. The original signal contained low frequency noise whose spectrum overlapped with that of speech, particularly around the f1 area. After the restoration procedures have been applied, the amplitude of noise has been decreased by over 20 dB, which separated it from the speech signal, thus making the file significantly more appropriate for reliable acoustic analysis.
Linear Protective Coding in Acoustic Analysis - Linear Predictive Coding (LPC) is often used by linguists as a formant extraction tool. There are a few important details about LPC that may help avoid common analysis errors. LPC analysis assumes that a signal is the output of a causal linear system. It also assumes that the vocal-tract system is an all-pole filter and that the input to the system is an impulse train. Because of these assumptions, LPC analysis is usually most appropriate for modeling vowels which are periodic and for which the vocal-tract resonator does not usually include zeroes (e.g., in nasalized vowels). The order of an LPC model is the number of poles in the filter. Usually, two poles are included for each formant + 2-4 additional poles to represent the source characteristics. For adult speakers, average formant spacing is in the 1000 Hz range for males and in the 1150 Hz range for females. The LPC order is related to the sample rate of the audio file: 10000 Hz - LPC order = 12-14 (males) and 8-10 (females); 22050 Hz - - LPC order = 24-26 (males) and 22-24 (females).

LPC usually requires a very good speech sample to work with. Many recordings done with omnidirectional microphones contain too little speech detail and too much noise to ascertain reliable LPC readings.

Introduction | Web Sound Basics | Recording | Processing
Analysis and Delivery | FAQ | Site Map | MATRIX