Stereo

  • Idea: Left and right stereo channels are highly correlated

  • Typical to take a stereo pair and turn it into a mono channel (l + r) / 2 and a side channel (l - r)

  • The side channel is typically low amplitude, and so can be compressed easily

  • Side benefit: mono channel is easily extracted

Downsampling

  • Idea: Most audio has low amplitudes at higher frequencies

  • Downsample the signal, transmit that

  • Loss is pretty noticeable at high compression rates; maybe need some residue coding

  • The signal path may be band-limited anyhow: embedded devices, guitar pedals, etc

  • MP3 (discussed in a bit) is a surprisingly close cousin to this scheme

Companding

  • Idea: Small differences in large amplitudes matter less. In particular, human hearing is log-amplitude

  • To best represent a signal in a fixed number of bits, squash the encoding so that there are fewer codes for larger amplitudes

  • Classic: 8-bit µ-Law, A-law

  • µ-Law: 14 bits in, 8 bits out

    • Continuous

      $$y(t) = \mathrm{sgn}(x(t)) \frac{\ln(1 + \mu |x(t)|)}{\ln(1 + \mu)}$$

      where µ is 255

    • Discrete version is given by big approximation table

POTS

  • US Plain Ol' Telephone Service compression is downsampling to 8000 sps and then µ-Law encoding to 8 bits, so 64000 bps

  • Lossy, but turns out to be good enough to sound OK for voice

  • Originally implemented entirely analog: the digital thing is a replicant

  • Characteristic telephone sound is mostly this

FLAC

  • Predict in time domain using polynomial model or Linear Predictive Code

  • Encode residue using Rice codes (related to Huffman codes)

  • Reliable compression of about 2×

  • Remember: the noise must be compressed and recreated also

Lossy Compression ala MP3

  • Good Ars Technica MP3 tutorial

  • High-level view:

    • Split the input signal up into a bunch of frequency bands using a "polyphase filter"

    • In each band:

      • Use an FFT to figure out what's going on

      • Use a DCT to get a power spectrum (noise subframes are speshul)

      • Quantize the spectrum to reduce the number of bits (giving power errors due to noise)

      • Huffman-encode the quantized coefficients to get a compact representation

    • Combine all the compressed quantized coefficients to get a frame

  • The details are quite complex: see something like Ogg Vorbis for a cleaner version

Last modified: Monday, 20 April 2020, 12:33 PM