Sure-Fire Tips for Encoding High-Quality, Low-Bandwidth Audio, Part 2
Encoder Tricks
Tip 24:
If you’re targeting 28.8 modem users, keep in mind they really don’t have that much bandwidth at their disposal. 28.8Kbps users will be lucky if they have 16Kbps to 22Kbps available to them at any given time, so encode accordingly. Low bandwidth targets give up more bandwidth proportionally to network overhead than higher bandwidth targets. Squeezing video into the stream means even fewer bits for audio. A good rule of thumb for targets 28.8Kbps and 56Kbps is 6.5Kbps for voice with video and 8Kbps for monaural music with video. At 64Kbps, bits increase to 8.5Kbps for voice (with video) and the same for monaural music with video. At dual ISDN, 112Kbps, you can expect to have 16Kbps for voice, 16Kbps for monaural music and 20Kbps for stereo music, giving up the majority of the bandwidth to video. Even at the higher speeds, adding video to the stream cuts the bits for audio by half what it would be if audio were flying solo. Translation: a music video won’t sound as good as the music alone.
Tidbit: For low bandwidth targets, almost all the codecs you’ll use will fall under the lossy category, meaning they’ll drop bits of audio information to make the squeeze. Lossless codecs require too many bits to push audio down 28.8, 56 and even 128Kbps lines.
Tidbit: All lossy codecs use perceptual coding techniques. They toss out extreme highs and lows and take out frequencies that are masked by others within the waveform in order to send through the best perceived sound for the bits. Even at a high frequency response, codecs don’t always give bit preference to the higher frequencies if the complexity of the content warrants sacrificing high frequencies in order to spend bits on mid-band frequencies.
Tip 25: Forego stereo. Don’t get two channels of bad sounding stereo. Get one reasonably good sounding monaural channel. The general consensus is that 96Kbps is the cutoff target for decent sound stereo.
Tip 26: If you’re bound and determined to push stereo down 20Kbps, you might get away with it if there’s not a lot of stereo separation (the difference information between left and right channels) in the audio. Set your MP3 encoder to Joint Stereo instead of Dual Channel. Joint Stereo is preferable because it processes the sum of the channels and the difference information separately. Since the difference information is usually less, the codec doesn’t have to dedicate bits to an entire channel as is the case with Dual Channel. WMA also has a stereo adaptive mode whereby it looks at the different samples on the different channels and decides which are common to both, which are different and encodes only the difference separately instead of two separate stereo channels.
Tidbit: Stereo has separate channels for each speaker, and monaural has one channel for both. To push that extra channel down 20Kbps, you’ll need to sacrifice frequency response. For example, RealAudio’s codecs for 20Kbps stereo give you frequencies up to 5kHz for stereo audio but twice that, 10kHz, for 20Kbps monaural.
Tidbit: The Joint Stereo setting takes advantage of the fact that most stereo sound fields have low difference information. It adds left and right channels to form "L+R", and subtracts left and right channels to form "L-R." The L-R, or difference channel, usually has much less information than the L+R channel. Often, L-R only has midrange frequencies and at much lower signal levels. This is because of the way music is mastered at recording time. Stereo music generally is only stereo at mid-frequencies because the ear is relatively insensitive to stereo separation at low frequencies due to the lengths of the waveforms and is insensitive to high frequency separation due to phase delay between one ear and the other.
Tip 27: Use your editor to do the stereo-to-mono conversion rather than your encoder. You’ll have more control over equalization, and you’ll be able to enhance and filter frequencies at the same time.
Tip 28: When encoding a live event, set the encoder’s and the capture card’s sample rate the same, so there’s no unnecessary re-sampling. Often, you can set the sample frequency to default on the capture card. Most encoders will then tell the capture card to sample at the native sample rate it encodes at.
Tip 29: If you’re getting a "swooshing" sound after encoding, cut more high frequencies out of the audio track. The ear won’t miss these frequencies anyway and filtering them is going to avoid possible aliasing (swoshing sound) caused by encoding high frequency content.
Tip 30: If you’re getting "flanging," or a "hallow," swishy sound, reduce the codec compression or reduce the frequency response and apply more aggressive low pass filtering to the audio file. Some codecs, including MP3, will spread bits over the frequency spectrum, which sounds like flanging to the ear.
Tip 31: If you’re getting flanging on a stereo MP3 file, change the Joint Stereo setting to Mono.
Tip 32: Set the encoder preference to favor audio over video should there be an interruption or lag in bandwidth service. The Web audience is more likely to tolerate herky-jerky video than poor audio quality.
Tip 33: Getting rid of an unwanted sound. Sometimes you can isolate an unwanted noise and remove it if it’s concentrated in a small band of frequencies. If so, you can notch the band out with a notch filter (available with most editors). With more broadband sounds that are long in duration, such as hiss from wind or air moving through an air conditioning vent, you can apply a noise reduction algorithm that’ll get rid of the hiss between speech or music content. You can also try bandwidth limiting the clip in order to get rid of the hiss above and below the frequencies of the words or music content, although some hiss will remain as part of the speech or music. Just about any professional editor will have algorithms for noise reduction and limiting bandwidth.
Next Page: Filter types and additional resources >>