CS 410/510 SOUND Sp2019: Time, Frequency, and the DFT

Sound Is Frequencies

Most sounds have high periodicity
Fourier’s Theorem (FOO-ree-YAY or thereabouts) says that an infinitely repeating sound can be represented as a sum of sinusoids
The ear hears/decomposes a sum of sinusoids
Yet PCM is a sequence of samples over time: frequency is not represented
- The Nyquist Limit is hard to think of as a signal change over time thing

We have: a continuous waveform, a function f(t) representing sound pressure
We want: a continuous spectrum, showing the amplitude and phase of sine waves at every frequency f̂(ω)
Wait, amplitude and phase from a single function? Yes, representing a frequency as a complex number with the usual geometric interpretation

\[ f = a + b i, |f| = \sqrt{a^2 - b^2}, \theta(f) = tan^{-1}(a, b) \]
Note: you will see both i and j for \(\sqrt{-1}\) in different contexts
Note: we freely mix between angular frequency \(\omega\) and “normal” frequency f (dammit) using

\[ \omega = 2 \pi f \]

because once around the circle is one cycle

Euler’s Formula says complex exponential is a sinusoid:

\[ e^{i (\omega t + \theta)} = cos(\omega t + \theta) + i~sin(\omega t + \theta) \]

\[ = e^{i \omega t} e^{i \theta} \]
Starting point for “phasor analysis”
Now our sum of sinusoids can be represented as a sum of exponentials, making things easier (?)

Reminder: Fourier claims that every \(f(t)\) can be represented as some \(\hat{f}(\omega)\) (more or less)
We think of the first kind of thing as “in the frequency domain”, the second as “in the time domain”
Converting from a single frequency to its time domain representation is “easy”:

\[ f(t) = e^{-i \omega t} \]
Even for a single sinusoid, converting the other way isn’t immediately obvious

Let’s just get the Fourier Transform out there:

\[ \hat{f}(\omega) = \int_{-\infty}^{\infty} f(t) e^{-i \omega t} dt \]
- The minus sign in the exponent is fiddly
- The infinite integral is alarming
- Still, we now can math up what we wanted
What about going the other way? Turns out that

\[ f(t) = \frac{1}{2 \pi} \int_{-\infty}^{\infty} \hat{f}(\omega) e^{i \omega t} d\omega \]
So the transform is almost self-inverse

Let’s “get rid of” the infinities by just taking limits

\[ \hat{f}(\omega) = \lim_{W \rightarrow \infty} \int_{-W}^{W} f(t) e^{-i \omega t} dt \]

A very long signal relative to its period should be well-approximated
Now a fiddly argument ensues. If the signal is periodic, we should be able to replace the limits by the period: after all, it doesn’t change anything

\[ \hat{f}(\omega) = \int_{0}^{P} f(t) e^{-i \omega t} dt \]
Let’s turn the integral into a sum by treating \(f(t)\) as a series of impulses: nonzero only at discrete timesteps

\[ X[k] = \sum_{n=0}^{N-1} x[n] e^{-i k n / N} \]
The frequency-domain value at unit frequency k, namely \(X[k]\) has been written as a weighted sum of the discrete time-domain values \(x[i]\) over the period \(N\)
The inverse transformation works similarly

\[ x[n] = \frac{1}{2 \pi N} \sum_{k=0}^{N-1} X[n] e^{i k n / N} \]

The Xs are still complex (!)
\(x[n]\) is treated as cyclic
The inverse is computed only as the sum of discrete frequencies
The sampling rate is handled here by unit frequency. Frequency needs to be scaled by the sample rate to be meaningful
This math is so full of fiddly that it is really easy to get it screwed up
Frequency resolution is a function of sampled period
The transforms as given are \(O(N^2)\)
The DFT can be thought of as a Gaussian Filter bank

A dynamic programming trick gets the time complexity of the DFT down to \(O(N \lg N)\)
No, I’m not going to explain it here: treat it as a black box
The FFT gives exactly the same answers as the DFT, just faster
Limitation: The FFT requires a power-of-two number of samples
Limitation: The output frequencies must be linear, which may not be what you wished for

Triangle, square: Fundamental plus “odd harmonics”
- Difference between triangle and square is decay rate, phase
- Odd harmonics are a characteristic of many kinds of distortion
“White noise” (random samples): Flattish noise spectrum (“noise is a sum of random frequencies”)

Fundamentally ill-defined: FT only works right on an infinite periodic signal, throws away all information about changes in frequency
Heisenbergy: Smaller DFT gives more local information about frequency, but at the cost of poorer accuracy. Low frequency signals, in particular, will not be captured
“Overlapping” heuristic: Do a small DFT, slide over a few samples, do it again

DFT treats input as cyclic: if not analyzing over exactly an integer number of periods, trouble
In particular, a step change corresponds to a smooth spectrum, so endpoints must match (also all derivatives)
“Windowing” heuristic: Taper the time-domain signal off toward the edges
A window corresponds to a low-pass filter: more next lecture

Pretty straightforward: do the DFT, look for peaks etc in the spectrum
Still have to do the analysis of the spectrum, which involves some kind of other math or heuristics
There are more sensitive frequency analysis methods that do not involve the DFT

Pretty straightforward: do the DFT, adjust frequencies as desired, do the inverse DFT
See localization in time: will end up doing “overlapping windowed” DFT to track changes input frequencies
Expensive compared to “direct” filtering we will discuss next time

Last modified: Tuesday, 9 April 2019, 11:12 AM