CS 410P/510 Sound Sp2021: Compression | psu-cs

A Look At A Sample

Let's consider a 16-bit audio sample:
```
00 010100000101 01
```
Sample consists of
- Headroom: Not recorded at maximum amplitude to avoid clipping
- Signal: The actual audio
- Noise: The low-order bits are typically garbage
The signal and noise blend together
As signals are manipulated, more noise creeps up into the signal bits because addition and multiplication

Clipping

Artifact of fixed-range representation of PCM sample: floating-point samples are basically unclippable
If the amplitude to be represented goes over the max possible value or under the min, what to do?
Not much except clamp (clip) the sample as close as you can get it
Net effect: tops clipped off waves
This happens in analog systems also because max/min voltages
Discontinuity introduces harmonics: bad distortion
This is why headroom

Noise

Several kinds of common audio noise
- Uniform "white noise": easy to make with a computer
- "Pink noise" that rolls off linearly with frequency ("1/f noise", "flicker noise")
- "Brownian noise" ("1/f^2 noise") from random walk in time domain
- Check out this Wikipedia article on noise colors

Audio Compression — Model and Residue

Idea: Build a simplified model of the audio signal — requires less space to represent.
Maybe the model is good enough for purpose: lossy compression
Otherwise send the model along with the residue — the difference between the model and the true signal: lossless compression
The residue might be a bit compressible too

Audio Compression — Model

Goal: build a simple parameterized approximate model of the audio signal
Time domain model
- Audio signals tend to be "smooth" continuous and have continuous derivatives
- Model the signal as lines or polynomials or splines
Frequency domain model
- Audio signals tend to be periodic; frequencies tend to vary slowly
- Model the spectrum with quantization in frequency and amplitude
- Phase is confusing

Audio Compression — Residue

The residue can often look like noise
Still, the amplitude of the residue should be small relative to the signal amplitude
Common compression schemes — Huffman coding, Rice coding, arithmetic coding — take advantage of frequency distribution of residue

Lossless vs Lossy Compression

Lossless (e.g. FLAC) is going to be limited for a lot of kinds of sounds. The fancier the model, the more kinds of sounds that can be compressed well
Doing lossy well is harder, because mustn't throw away stuff that wrecks the sound. Psychoacoustics is needed. Tends to be done in frequency domain; models are generalized

Last modified: Monday, 20 April 2020, 12:32 PM