Wavelets for Sound
How a physicist, a geophysicist, and a computer-music researcher in Marseille transformed signal analysis — and gave us a new way to hear.
How a physicist, a geophysicist, and a computer-music researcher in Marseille transformed signal analysis — and gave us a new way to hear.
The wavelet transform was born from frustration with Fourier analysis. Jean Morlet, exploring seismic signals at Elf Aquitaine in the early 1980s, needed a method that could localise both frequency and time. Fourier gave him one but not the other.
What made Marseille unique was the convergence: a theoretical physicist (Grossmann), a geophysicist with engineering intuition (Morlet), and a computer musician who needed real-time tools (Kronland-Martinet) — all working at CNRS laboratories within walking distance of each other in the Luminy campus, nestled in the calanques above the Mediterranean.
The central idea: instead of comparing a signal to infinite sinusoids (Fourier), compare it to shifted and scaled copies of a single localised waveform — a "wavelet."
Here $a$ is the scale (controlling frequency resolution — large $a$ catches low frequencies, small $a$ catches high), $b$ is the translation (position in time), and $\psi$ is the "mother wavelet." The factor $1/\sqrt{|a|}$ normalises energy across scales.
The Short-Time Fourier Transform uses a window of fixed width. This means you get the same time resolution at every frequency — but musically, a 50 Hz bass note needs a much longer observation window than a 5 kHz transient. The wavelet transform adapts: it automatically uses long windows for low frequencies and short windows for high frequencies. This is exactly how musical pitch perception works — the ear is a constant-Q analyser.
Below, a frequency-sweeping chirp is analysed with both methods. The STFT spectrogram (left) smears the chirp across time at low frequencies. The wavelet scalogram (right) tracks it cleanly at all scales.
Click and drag on the signal below to slide a Morlet wavelet across it. The highlighted cell in the scalogram updates in real time, showing exactly which coefficient is computed at that position and scale.
For the transform to be invertible — so we can reconstruct the signal from its wavelet coefficients — the wavelet must satisfy the admissibility condition:
This requires that the wavelet has zero mean (its integral over all time is zero) and that its Fourier transform $\hat{\Psi}(\omega)$ decays sufficiently fast. The Morlet wavelet satisfies this (approximately — a small correction term is needed), while pure Gaussians do not.
This double integral reconstructs the original signal perfectly from its wavelet coefficients. Grossmann and Morlet proved this rigorously using the theory of square-integrable group representations — specifically, the affine group (translation + dilation). This was the mathematical breakthrough that separated wavelets from earlier ad-hoc multiscale methods.
Every analysis method is constrained by the Heisenberg uncertainty principle: you cannot simultaneously resolve time and frequency arbitrarily precisely. The product of time spread Δt and frequency spread Δω is always ≥ 1/2. The difference lies in how that uncertainty is distributed across frequencies.
Design a custom wavelet by adjusting its parameters and see whether it satisfies admissibility — whether its Fourier transform decays correctly and its mean is zero.
The wavelet that started it all: a complex sinusoid inside a Gaussian envelope. It is the closest relative of musical tones in the wavelet family.
The parameter $\omega_0$ controls the number of oscillations inside the Gaussian window. A typical choice is $\omega_0 = 5$ or $6$, giving good frequency resolution. Lower values give better time resolution at the cost of frequency precision.
Kronland-Martinet emphasised that the Morlet wavelet is particularly suited to audio because its shape mirrors the structure of musical tones: an oscillation modulated by an amplitude envelope. When dilated, it behaves like a constant-Q filter bank — each scale corresponds to a filter whose bandwidth is a fixed fraction of its centre frequency, just like musical intervals.
The quality factor Q = f₀/Δf (centre frequency divided by bandwidth) stays constant across all scales. On a piano, each octave has the same number of notes regardless of register. The wavelet transform's log-frequency spacing naturally matches this musical structure, whereas the STFT's linear frequency bins waste resolution on low frequencies and lack resolution at high frequencies for musical applications.
The Morlet wavelet is one member of a broader family. Each has different trade-offs between time localisation, frequency localisation, oscillation count, and admissibility.
Kronland-Martinet's 1987 and 1988 papers demonstrated that wavelet scalograms reveal features of musical sounds that spectrograms obscure: attack transients, vibrato trajectories, the onset structure of piano notes, and the formant evolution of speech.
Choose a signal type and watch the CWT scalogram build row by row. Each row is computed by sliding the Morlet wavelet at a particular scale across the signal and recording the correlation at each position.
In their analysis of piano sounds, the wavelet representation revealed that the attack portion — the first few tens of milliseconds — contains a broadband burst that excites all harmonics simultaneously, followed by a gradual decay where each partial has its own damping rate. Higher harmonics decay faster, which is why a piano note "mellows" over time. The wavelet scalogram captures this evolution naturally, while a single FFT frame cannot.
For speech, they showed that the wavelet transform tracks the rapid formant transitions during consonant-vowel boundaries with far greater temporal precision than the STFT, because the high-frequency wavelets (small scales) provide millisecond-level time resolution.
Grossmann and Kronland-Martinet showed that for signals with slowly-varying frequency, the ridge of the wavelet transform — the curve of maximum magnitude across scales at each time — gives a direct estimate of the instantaneous frequency. This became a powerful tool for tracking pitch contours, vibrato, and glissandi in musical signals.
The reconstruction formula can be applied to modified wavelet coefficients to produce a transformed sound — but not every modification is legitimate. Kronland-Martinet exploited carefully chosen manipulations for additive resynthesis using extracted instantaneous amplitude and frequency.
A crucial subtlety: the CWT maps signals from $L^2(\mathbb{R})$ into a reproducing kernel Hilbert space — a proper subspace of $L^2(\mathbb{R}^2)$. Not every function on the time-scale half-plane is the CWT of a real signal; valid coefficients must satisfy an internal consistency governed by the reproducing kernel $K(a,b;\,a',b') = \langle \psi_{a,b},\,\psi_{a',b'}\rangle$. Arbitrarily modified coefficients generally fall outside this subspace. Applying the reconstruction formula still yields some signal, but the inverse implicitly projects back onto the valid subspace, potentially distorting the intended transformation. Kronland-Martinet emphasised that for a given reconstruction kernel, not all transforms are admissible — the modification must respect the structure of the representation.
Using the complex Morlet wavelet, each ridge of the scalogram can be interpreted as a single partial (a sinusoidal component) of the sound. The modulus at the ridge gives the instantaneous amplitude, and the phase derivative gives the instantaneous frequency. By extracting these for each partial and resynthesising with oscillators, one obtains a high-quality reconstruction that can also be manipulated.
Kronland-Martinet noted that wavelet-based resynthesis has a key advantage over Fourier-based methods: because the wavelet analysis adapts its time resolution to frequency, transients are captured with high temporal precision while sustained tones are captured with high frequency precision. This "best of both worlds" is exactly what music demands.
In 1987, Grossmann, Holschneider, Kronland-Martinet and Morlet published a paper on detecting abrupt changes in sound signals using wavelet transforms — one of the earliest works on wavelet-based edge detection in signals.
A sharp transient — a click, a note onset, a consonant burst — creates a distinctive pattern in the wavelet domain: a vertical line of high energy across all scales, because the discontinuity excites wavelets at every frequency simultaneously. This "cone of influence" spreads outward from the transient's location, wider at lower frequencies (larger scales) and narrower at higher frequencies.
This capability was crucial for music: separating the "attack" portion of a note (which carries perceptual identity) from the "sustain" portion (which carries pitch) is fundamental to understanding timbre. The wavelet transform does this naturally, whereas the STFT forces a compromise between time and frequency resolution that invariably blurs one or the other.
Kronland-Martinet's 1988 Computer Music Journal paper demonstrated four ground-breaking sound transformations, all performed by manipulating wavelet coefficients before reconstruction. Because the CWT representation is highly redundant, these modifications must be chosen with care — as discussed above, arbitrary changes to coefficients may not respect the reproducing kernel, and the reconstruction will silently project the result back onto the space of valid transforms.
By interpolating the wavelet coefficients along the time axis (stretching the b parameter) while keeping the scale axis unchanged, the sound is slowed down without changing its pitch. This is because the frequency content at each scale is preserved — only the temporal evolution is altered.
By shifting the wavelet coefficients along the scale axis (moving to different scales) while keeping the time axis unchanged, the pitch is transposed without altering duration.
Zeroing out wavelet coefficients at selected scales approximately removes those frequency bands from the signal — though the result is not strictly the CWT of any signal (it violates the reproducing kernel), the reconstruction projects it onto the nearest valid transform, which in practice gives a good bandpass effect. Because the filtering is done in the time-scale plane, it can vary over time — for instance, removing high frequencies only during the sustain while preserving the full-bandwidth attack.
The most striking demonstration: take the spectral envelope from one sound and impose it on another. The result carries the timbral "shape" of the first sound — its formants, resonances, vowel identity — while retaining the temporal structure and pitch of the second. Try "Speech (counting)" with "Rain/texture" for the classic talking-rain effect, or swap magnitude and phase to hear rain that whispers words.
One of the most powerful practical applications of the wavelet transform: separating signal from noise by selectively suppressing small wavelet coefficients, while leaving large ones untouched.
Broadband noise appears in the wavelet domain as low-magnitude coefficients spread uniformly across all scales and times. Musical signals concentrate their energy in a sparse set of large coefficients. Soft thresholding — zeroing coefficients below a threshold λ and shrinking the rest toward zero — exploits this sparsity to denoise without introducing musical noise artefacts.
Donoho and Johnstone (1994) proved that for Gaussian noise with standard deviation σ, the optimal threshold is $\lambda = \sigma\sqrt{2 \log N}$. In practice, the wavelet denoising threshold is estimated from the noise level in the finest-scale coefficients, which are dominated by noise.
The Marseille group's work rippled outward in every direction — from pure mathematics to MP3 compression, from medical imaging to gravitational wave detection.
Kronland-Martinet continues as Director of Research at the PRISM laboratory (CNRS / Aix-Marseille University), where his group now works on sound perception, cognitive aspects of timbre, and physically-informed synthesis models. The thread from 1985 continues unbroken: wavelets revealed the structure of sound; the question now is how that structure maps to human perception and cognition.
Alex Grossmann (1930–2019) lived to see wavelets become one of the most widely used tools in applied mathematics. Born in Zagreb, a refugee from wartime Croatia, he found his way through Princeton, Brandeis, and the Courant Institute before settling in Marseille, where his collaboration with Morlet would change signal processing forever.
The Marseille papers were the seed. What grew from them, in just a decade, reshaped applied mathematics and pushed wavelets into almost every corner of signal processing:
The story told here is one chapter in a larger family. Four close cousins cover the rest of the audio time–frequency landscape:
The key publications from the Marseille collaboration and its direct descendants remain foundational reading: