Digital Sound


  • Idea: Discretize analog signal in time and space as "samples"

  • Simplest to use uniform sampling time (discretization in time), fixed-size binary representation of sample values (discretization in amplitude)

    • High sampling rates and lots of bits is more accurate, but "wasteful" for slowly-varying or low-amplitude signals
  • This is often called "Pulse Code Modulation" (PCM), and is the basis of most "time-domain" digital representations of sound

  • Usually PCM is "Linear Pulse Code Modulation" (LPCM): the binary amplitude values are interpreted directly. Sometimes a function is used to transform the sample values (e.g. A-law, μ-law) to try to use fewer bits per sample with a decent representation: see below

Advantages of Discretization

  • Somewhat noise-immune: small variations in amplitude will be deliberately lost by the process

  • Can be perfectly stored, transmitted and reproduced (all loss is up-front)

  • Can be manipulated with a computer: simple compared to analog

  • Audiophiles hate it

Nyquist Limit

  • Sound is a fundamentally frequency-domain (sum of sinusoids) thing: PCM treats it as time-domain

  • A particularly striking example of this is the "Nyquist Limit"

  • To make PCM work well, we need to ensure that we don't try to represent signals that vary quickly relative to the sampling rate

  • Specifically, we need to ensure that frequencies above half the sample rate are not present in the underlying signal (this is a strange way of putting things, but the math checks out)

  • We will return to this topic throughout the course

PCM Representation

  • Sound is represented as sequence of samples: numbers

  • There is some specified sample rate in samples per second: note that samples per second is max frequency in Hz, because Nyquist

    • For the human frequency range, 44100 sps (22050 Hz) is more than adequate

    • Typical low-end music stuff will run at 24000 sps (12000 Hz) or the analog equivalent bandwidth

    • Voice can be adequately reproduced to be intelligible at 8000 sps (4KHz). This is the Plain Ol' Telephone Line (POTS) standard

  • For recording / playback, usually fixed-width integers: signed or unsigned

    • For the dynamic range of human hearing, 16 bits is plenty

    • Sound is quite intelligible even at 8 bits, especially if encoded / decoded non-linearly on the front and back (A-law, μ-law) to compress louder amplitudes

      • POTS, old video games, etc
    • Units are complicated: usually just normalized to range of values

  • Internal calculations are different

    • Sample rate of 96000 sps is not uncommon, to give room for frequency domain calculations at the high end (discussed later in the course)

    • Floating point samples are really useful for convenience and to avoid overflow in calculations. 0.0..1.0 is common, as is -1.0..1.0.

Raw Storage and Transmission

  • Stereo (or more), so sample-per-channel, interleaved in the "obvious" way: frames. Frames are typically stored / transmitted sequentially: for a stereo signal

          frame0         frame1         frame2
  • Lots is implicit: Sample rate? Frame size in channels? Frame order? Sample size? Sample endianness? Interpretation of sample bits?

    • Transmission: Usually these things are specified by some external protocol

    • Storage: Often many of these things are recorded in a header for flexibility

Last modified: Tuesday, 31 March 2020, 7:58 PM