CS 410/510 SOUND Sp2019: Analog and Digital Sound Representation

Natural Sound - Resonance

It is easy to produce sound that contains a jumble of frequencies: wind, impulse/rattling/vibrating
A resonant cavity is a filter: amplifies frequencies near its wavelength (and multiples), suppresses other frequencies
Most sound-producing things operate in/with a resonant cavity: voice, instruments, etc

Noisemaker + resonant cavity
- Wind: buzzing lips or reed + tube
- String: vibrating string + usually cavity
- Percussion: impulse + usually cavity
- Misc
Pitch adjustment by tension or length; cavity length modification via holes (or slide) — so many choices
Most but not all monophonic: one sound at a time

Ideally, electric signal exactly represents sound pressure
In practice, the signal path may introduce distortion
- Nonlinearity: the signal doesn't accurately track the sound pressure
- History: the past signal influences the current signal
We will talk about "harmonic distortion" (THD) at some point

Turn electrical signal into air pressure change
Wire solenoid attached to paper cone like this
Typically in a resonant cavity (speaker cabinet)
Speaker solenoid roughly tracks change in current through the wire
Need wavelength to be long for low frequency: big speaker or pair of separated speakers (long baseline) — "woofer"
- C.f. Huygens's Principle and formula:
Need response time to be fast for high frequency: tiny speaker, maybe piezoelectric — "tweeter"

Turn sound into electrical signal: usually with microphone
Microphone varies resistance, capacitance or voltage (reversed speaker) depending on air pressure differential between front and back
Many variations on this theme
Microphones are bad: noisy, nonlinear devices; usually limiting factor in sound chain

We now know how to build something like a telephone or record player or stomp box:
- Use a microphone to convert air pressure to voltage
- Maybe process the voltage somehow: store it somewhere or modify it with circuitry
- Use a speaker to convert voltage back to sound

"Feedback" is a classic oscillation effect:
- Sound coming out the speaker and back into the microphone interacts with speaker + microphone + air as a resonance
- The resonant frequency depends on the distance between microphone and speaker
- If the loop has net positive gain at some frequency (amplification)…

Representation of analog sound as an electrical signal is potentially awesome: high accuracy in time, can represent very high and low frequencies well
In practice, there are problems:
- Any "noise" (unwanted signal) is also very accurately represented
- Analog signal storage devices are clunky, and don't work well: records, tapes, etc
- Manipulating electrical signals requires complex and expensive and special-purpose electronics
- "Audiophiles" love this stuff, so you have to deal with them (could be worst problem)

"Digitizing" analog sound solves our problems:
- Somewhat noise-immune
- Can be stored in digital memory
- Can be manipulated with a simple computer
- Audiophiles hate it
What representation should we choose? Discretize analog signal in time and space as "samples"
Simplest to use uniform sampling time interval, binary integer representation of sample values
- High sampling rates and lots of bits is more accurate, but "wasteful" for slowly-varying signals
This is often called "Pulse Code Modulation" (PCM), and is the basis of most ("time-domain") digital representations of sound
Usually PCM is "Linear Pulse Code Modulation" (LPCM): the binary sample values are interpreted directly. Sometimes a function is used to transform the sample values (e.g. A-law, μ-law) to try to use fewer bits with a decent representation: see below

Sound is a fundamentally frequency-domain (sum of sinusoids) thing: PCM treats it as time-domain
A particularly striking example of this is the "Nyquist Limit"
To make PCM work well, we need to ensure that we don't try to represent signals that vary quickly relative to the sampling rate
Specifically, we need to ensure that frequencies above half the sample rate are not present in the underlying signal (this is a strange way of putting things, but the math checks out)
We will return to this topic throughout the course

Sound is represented as sequence of samples: numbers
- Usually fixed-width integers: signed or unsigned
- Floating point can also be a thing
- Units are complicated: usually just normalized to range of values for int and 0..1 or -1..1 for float
There is some specified sample rate in samples per second: note that samples per second is 2× max frequency in Hz, because Nyquist
Stereo (or more), so sample-per-channel, interleaved in the "obvious" way: frames

Band-limited via Nyquist (approximation in time)
Quantization due to finite representation (approximation in amplitude)
Assumes an idealized sampling clock — clock "skew" and "jitter" is a thing for real clocks

Need to take a binary number to a voltage
Classic method: direct conversion via R/2R Ladder
- Very fast, simple
- Accuracy issues are real: bit voltages and component values must be matched pretty exactly
Classic method: Pulse Width Modulation (PWM)
- Digital all the way to single-wire output
- Arbitrary resolution dependent on timing
- Really hard to get the filtering right for audio applications: want super-high pulse rate
Fancy methods: can talk about later if folks are interested

Convert voltage on wire to binary number
This is the "hard" direction: the DAC tricks aren't invertible
One common approach uses some combination of DACs and comparators to try to make the DAC output match the analog input
Discussion

Last modified: Monday, 8 April 2019, 8:11 PM