 |
48,000
Hz or 96,000 Hz /24-bit PCM audio files are not suitable for
acoustic analysis or delivery over the Internet. They should
be stored as master, preservation copies of the recordings.
Such files should be described in commom metadata terms (such
as Dublin Core, METS or OLAC) and stored on a dependable optical
storage medium, such as CD-R or DVD. The best storage formats
are uncompressed, PCM digital audio files, such as Microsoft
WAV, The Broadcast Wav Format, or AIFF. |
| Preparing
Files for Analysis and Delivery - 96,000
Hz or 48,000 Hz/24-bit PCM audio files are not suitable for
acoustic analysis or online delivery. In order to prepare
files for acoustic analysis, several digital signal processing
(DSP) techniques should be used. First, the files have to
be downsampled to 11,025 Hz and their resolution should be
changed to 16 bits. The downsampling process should always
include the use of anti-aliasing filters, and the resolution
change should include dithering to minimize the effects of
quantization noise. Both Sonic Foundry Sound Forge 5.0 and
BIAS Peak VST offer adequate DSP tools to do that. The Nyquist
theorem guarantees that the frequency response of a 11,025
Hz is exactly half that value. This is adequate to analyze
most speech sounds. If we are dealing with female and child
voices, a higher sample rate (e.g., 22,000 Hz) may be used
to make sure that the spectrum contains higher frequency transient
sounds.
|
Digital
Restoration - Often, linguists, oral
historians, anthropologists, and educators find themselves
dealing with old, noisy recordings. The digitization process
itself does not improve the quality of the recording, nor
does it remove any of its imperfections. However, there are
a few simple DSP techniques that can be used to clean up and
enhance the recording. The example below shows how several
DSP techniques have effectively removed some of the unwanted
noise from an old DARE recording. The original signal contained
low frequency noise whose spectrum overlapped with that of
speech, particularly around the f1 area. After the restoration
procedures have been applied, the amplitude of noise has been
decreased by over 20 dB, which separated it from the speech
signal, thus making the file significantly more appropriate
for reliable acoustic analysis. |
|
Linear
Protective Coding in Acoustic Analysis - Linear
Predictive Coding (LPC) is often used by linguists as a formant
extraction tool. There are a few important details about LPC
that may help avoid common analysis errors. LPC analysis assumes
that a signal is the output of a causal linear system. It
also assumes that the vocal-tract system is an all-pole filter
and that the input to the system is an impulse train. Because
of these assumptions, LPC analysis is usually most appropriate
for modeling vowels which are periodic and for which the vocal-tract
resonator does not usually include zeroes (e.g., in nasalized
vowels). The order of an LPC model is the number of poles
in the filter. Usually, two poles are included for each formant
+ 2-4 additional poles to represent the source characteristics.
For adult speakers, average formant spacing is in the 1000
Hz range for males and in the 1150 Hz range for females. The
LPC order is related to the sample rate of the audio file:
10000 Hz - LPC order = 12-14 (males) and 8-10 (females); 22050
Hz - - LPC order = 24-26 (males) and 22-24 (females).
LPC
usually requires a very good speech sample to work with. Many
recordings done with omnidirectional microphones contain too
little speech detail and too much noise to ascertain reliable
LPC readings.
|
|
|