Wavelets for Sound

How a physicist, a geophysicist, and a computer-music researcher in Marseille transformed signal analysis — and gave us a new way to hear.

Alex Grossmann Jean Morlet Richard Kronland-Martinet
Begin the story ↓

1Origins — From Seismology to Song

The wavelet transform was born from frustration with Fourier analysis. Jean Morlet, exploring seismic signals at Elf Aquitaine in the early 1980s, needed a method that could localise both frequency and time. Fourier gave him one but not the other.

1946
Dennis Gabor proposes time-frequency atoms — Gaussian-windowed sinusoids — drawing on ideas from quantum mechanics. The Short-Time Fourier Transform is born, but its fixed window size limits its flexibility.
1980–82
Jean Morlet, a geophysicist at Elf Aquitaine, experiments with "wavelets of constant shape" — analysis functions that dilate rather than shift in frequency. He struggles to find mathematical rigour for his intuition.
1984
Alex Grossmann, a theoretical physicist at CNRS Marseille-Luminy (Centre de Physique Théorique), provides the mathematical framework. Their landmark paper "Decomposition of Hardy Functions into Square Integrable Wavelets of Constant Shape" formalises the Continuous Wavelet Transform. Goupillaud joins for seismic applications.
1985
Richard Kronland-Martinet, working in the computer music group ("Équipe d'Informatique Musicale") at LMA-CNRS Marseille, implements the first real-time wavelet transform for sound using the SYTER audio processor — a custom hardware digital signal processor designed by Jean-François Allouis at GRM (Groupe de Recherches Musicales), Paris. SYTER (Système Temps Réel) used specialised DSP boards running at 100 kHz sample rates to perform transforms impossible on general-purpose computers of the era. Kronland-Martinet programmed SYTER to compute CWT coefficients in real time, producing the first live wavelet-domain analysis and resynthesis of audio. This was the first application of wavelets to audio signals.
1987
Kronland-Martinet, Morlet & Grossmann publish "Analysis of Sound Patterns through Wavelet Transforms" — the foundational paper applying wavelets to speech and music recognition. They also produce a 40-minute demonstration video.
1988
Kronland-Martinet publishes in the Computer Music Journal (MIT Press): "The use of the wavelet transform for the analysis, synthesis and processing of speech and music sounds" — demonstrating time-stretching, pitch-shifting, cross-synthesis and filtering, all performed via wavelet-domain manipulation.
1989
Grossmann, Kronland-Martinet & Morlet publish "Reading and Understanding Continuous Wavelet Transforms" in the landmark Springer volume Wavelets. Kronland-Martinet completes his Doctorat d'État on wavelet-based sound analysis and synthesis. The field explodes: Daubechies, Mallat, and Meyer develop discrete wavelet theory.

What made Marseille unique was the convergence: a theoretical physicist (Grossmann), a geophysicist with engineering intuition (Morlet), and a computer musician who needed real-time tools (Kronland-Martinet) — all working at CNRS laboratories within walking distance of each other in the Luminy campus, nestled in the calanques above the Mediterranean.

2The Continuous Wavelet Transform

The central idea: instead of comparing a signal to infinite sinusoids (Fourier), compare it to shifted and scaled copies of a single localised waveform — a "wavelet."

$$W(a, b) = \frac{1}{\sqrt{|a|}} \int_{-\infty}^{\infty} x(t)\, \psi^*\!\left(\frac{t - b}{a}\right) dt$$

Here $a$ is the scale (controlling frequency resolution — large $a$ catches low frequencies, small $a$ catches high), $b$ is the translation (position in time), and $\psi$ is the "mother wavelet." The factor $1/\sqrt{|a|}$ normalises energy across scales.

Why not just use the STFT?

The Short-Time Fourier Transform uses a window of fixed width. This means you get the same time resolution at every frequency — but musically, a 50 Hz bass note needs a much longer observation window than a 5 kHz transient. The wavelet transform adapts: it automatically uses long windows for low frequencies and short windows for high frequencies. This is exactly how musical pitch perception works — the ear is a constant-Q analyser.

Interactive: Wavelet vs STFT — a chirp signal

Below, a frequency-sweeping chirp is analysed with both methods. The STFT spectrogram (left) smears the chirp across time at low frequencies. The wavelet scalogram (right) tracks it cleanly at all scales.

Left: STFT spectrogram · Right: CWT scalogram (Morlet wavelet) · Drag chirp speed to compare

Interactive: Drag the wavelet probe

Click and drag on the signal below to slide a Morlet wavelet across it. The highlighted cell in the scalogram updates in real time, showing exactly which coefficient is computed at that position and scale.

Top: signal with wavelet overlaid at cursor position · Bottom: scalogram with active cell highlighted — drag horizontally to move, use slider or vertical drag to change scale

The Admissibility Condition

For the transform to be invertible — so we can reconstruct the signal from its wavelet coefficients — the wavelet must satisfy the admissibility condition:

$$C_\psi = \int_0^{\infty} \frac{|\hat{\Psi}(\omega)|^2}{\omega}\, d\omega \;<\; \infty$$

This requires that the wavelet has zero mean (its integral over all time is zero) and that its Fourier transform $\hat{\Psi}(\omega)$ decays sufficiently fast. The Morlet wavelet satisfies this (approximately — a small correction term is needed), while pure Gaussians do not.

The Reconstruction Formula

$$x(t) = \frac{1}{C_\psi} \iint W(a,b)\, \frac{1}{\sqrt{|a|}}\, \psi\!\left(\frac{t-b}{a}\right) \frac{da\,db}{a^2}$$

This double integral reconstructs the original signal perfectly from its wavelet coefficients. Grossmann and Morlet proved this rigorously using the theory of square-integrable group representations — specifically, the affine group (translation + dilation). This was the mathematical breakthrough that separated wavelets from earlier ad-hoc multiscale methods.

Interactive: Heisenberg Boxes

Every analysis method is constrained by the Heisenberg uncertainty principle: you cannot simultaneously resolve time and frequency arbitrarily precisely. The product of time spread Δt and frequency spread Δω is always ≥ 1/2. The difference lies in how that uncertainty is distributed across frequencies.

Left: STFT — fixed-size tiles at all frequencies · Right: CWT — wide tiles (good frequency resolution) at low frequencies, narrow tiles (good time resolution) at high frequencies

Interactive: Admissibility Condition Explorer

Design a custom wavelet by adjusting its parameters and see whether it satisfies admissibility — whether its Fourier transform decays correctly and its mean is zero.

Left: wavelet shape (real part) · Centre: Fourier magnitude |Ψ(ω)| · Right: admissibility integrand |Ψ(ω)|²/ω — must be finite (green) for reconstruction to be possible

3The Morlet Wavelet

The wavelet that started it all: a complex sinusoid inside a Gaussian envelope. It is the closest relative of musical tones in the wavelet family.

$$\psi(t) = \pi^{-1/4}\, e^{i\omega_0 t}\, e^{-t^2/2}$$

The parameter $\omega_0$ controls the number of oscillations inside the Gaussian window. A typical choice is $\omega_0 = 5$ or $6$, giving good frequency resolution. Lower values give better time resolution at the cost of frequency precision.

Interactive: Explore the Morlet wavelet

Top: Real part (blue) and envelope (gold) · Bottom: Fourier magnitude — note the bandpass shape

Kronland-Martinet emphasised that the Morlet wavelet is particularly suited to audio because its shape mirrors the structure of musical tones: an oscillation modulated by an amplitude envelope. When dilated, it behaves like a constant-Q filter bank — each scale corresponds to a filter whose bandwidth is a fixed fraction of its centre frequency, just like musical intervals.

Why "constant Q"?

The quality factor Q = f₀/Δf (centre frequency divided by bandwidth) stays constant across all scales. On a piano, each octave has the same number of notes regardless of register. The wavelet transform's log-frequency spacing naturally matches this musical structure, whereas the STFT's linear frequency bins waste resolution on low frequencies and lack resolution at high frequencies for musical applications.

Wavelet Family Comparison

The Morlet wavelet is one member of a broader family. Each has different trade-offs between time localisation, frequency localisation, oscillation count, and admissibility.

Morlet
Mexican Hat
Gabor (real)
Notelet
Morlet: complex Gaussian-modulated sinusoid — analytic, constant-Q, the natural choice for audio

4Sound Analysis — Seeing Music

Kronland-Martinet's 1987 and 1988 papers demonstrated that wavelet scalograms reveal features of musical sounds that spectrograms obscure: attack transients, vibrato trajectories, the onset structure of piano notes, and the formant evolution of speech.

Interactive: Build a scalogram

Choose a signal type and watch the CWT scalogram build row by row. Each row is computed by sliding the Morlet wavelet at a particular scale across the signal and recording the correlation at each position.

Top: signal waveform · Bottom: CWT scalogram (scale vs time, colour = |W(a,b)| or ∠W(a,b)) · Piano roll overlay shows expected note/harmonic positions

What Kronland-Martinet saw

In their analysis of piano sounds, the wavelet representation revealed that the attack portion — the first few tens of milliseconds — contains a broadband burst that excites all harmonics simultaneously, followed by a gradual decay where each partial has its own damping rate. Higher harmonics decay faster, which is why a piano note "mellows" over time. The wavelet scalogram captures this evolution naturally, while a single FFT frame cannot.

For speech, they showed that the wavelet transform tracks the rapid formant transitions during consonant-vowel boundaries with far greater temporal precision than the STFT, because the high-frequency wavelets (small scales) provide millisecond-level time resolution.

Instantaneous Frequency from the Ridge

Grossmann and Kronland-Martinet showed that for signals with slowly-varying frequency, the ridge of the wavelet transform — the curve of maximum magnitude across scales at each time — gives a direct estimate of the instantaneous frequency. This became a powerful tool for tracking pitch contours, vibrato, and glissandi in musical signals.

Scalogram with extracted ridge (white line) tracking instantaneous frequency of a vibrato tone

5Wavelet Synthesis — From Analysis Back to Sound

The reconstruction formula can be applied to modified wavelet coefficients to produce a transformed sound — but not every modification is legitimate. Kronland-Martinet exploited carefully chosen manipulations for additive resynthesis using extracted instantaneous amplitude and frequency.

A crucial subtlety: the CWT maps signals from $L^2(\mathbb{R})$ into a reproducing kernel Hilbert space — a proper subspace of $L^2(\mathbb{R}^2)$. Not every function on the time-scale half-plane is the CWT of a real signal; valid coefficients must satisfy an internal consistency governed by the reproducing kernel $K(a,b;\,a',b') = \langle \psi_{a,b},\,\psi_{a',b'}\rangle$. Arbitrarily modified coefficients generally fall outside this subspace. Applying the reconstruction formula still yields some signal, but the inverse implicitly projects back onto the valid subspace, potentially distorting the intended transformation. Kronland-Martinet emphasised that for a given reconstruction kernel, not all transforms are admissible — the modification must respect the structure of the representation.

Using the complex Morlet wavelet, each ridge of the scalogram can be interpreted as a single partial (a sinusoidal component) of the sound. The modulus at the ridge gives the instantaneous amplitude, and the phase derivative gives the instantaneous frequency. By extracting these for each partial and resynthesising with oscillators, one obtains a high-quality reconstruction that can also be manipulated.

Interactive: Additive resynthesis from wavelet analysis

Top: Original harmonic signal · Middle: extracted partials from CWT ridges · Bottom: resynthesised waveform

Kronland-Martinet noted that wavelet-based resynthesis has a key advantage over Fourier-based methods: because the wavelet analysis adapts its time resolution to frequency, transients are captured with high temporal precision while sustained tones are captured with high frequency precision. This "best of both worlds" is exactly what music demands.

6Detection of Abrupt Changes

In 1987, Grossmann, Holschneider, Kronland-Martinet and Morlet published a paper on detecting abrupt changes in sound signals using wavelet transforms — one of the earliest works on wavelet-based edge detection in signals.

A sharp transient — a click, a note onset, a consonant burst — creates a distinctive pattern in the wavelet domain: a vertical line of high energy across all scales, because the discontinuity excites wavelets at every frequency simultaneously. This "cone of influence" spreads outward from the transient's location, wider at lower frequencies (larger scales) and narrower at higher frequencies.

Interactive: Transient detection

Transients appear as vertical streaks across all scales — their precise timing is revealed

This capability was crucial for music: separating the "attack" portion of a note (which carries perceptual identity) from the "sustain" portion (which carries pitch) is fundamental to understanding timbre. The wavelet transform does this naturally, whereas the STFT forces a compromise between time and frequency resolution that invariably blurs one or the other.

7Sound Transformations

Kronland-Martinet's 1988 Computer Music Journal paper demonstrated four ground-breaking sound transformations, all performed by manipulating wavelet coefficients before reconstruction. Because the CWT representation is highly redundant, these modifications must be chosen with care — as discussed above, arbitrary changes to coefficients may not respect the reproducing kernel, and the reconstruction will silently project the result back onto the space of valid transforms.

Time-stretching without pitch change

By interpolating the wavelet coefficients along the time axis (stretching the b parameter) while keeping the scale axis unchanged, the sound is slowed down without changing its pitch. This is because the frequency content at each scale is preserved — only the temporal evolution is altered.

Wavelet-domain time stretching — pitch is preserved while duration changes

Pitch-shifting without time change

By shifting the wavelet coefficients along the scale axis (moving to different scales) while keeping the time axis unchanged, the pitch is transposed without altering duration.

Wavelet-domain pitch shifting — duration is preserved while pitch changes by the selected interval

Time-varying filtering

Zeroing out wavelet coefficients at selected scales approximately removes those frequency bands from the signal — though the result is not strictly the CWT of any signal (it violates the reproducing kernel), the reconstruction projects it onto the nearest valid transform, which in practice gives a good bandpass effect. Because the filtering is done in the time-scale plane, it can vary over time — for instance, removing high frequencies only during the sustain while preserving the full-bandwidth attack.

Top: original scalogram · Bottom: time-varying filtered scalogram — filter fades in between the dashed markers (attack preserved, sustain filtered)

Cross-synthesis

The most striking demonstration: take the spectral envelope from one sound and impose it on another. The result carries the timbral "shape" of the first sound — its formants, resonances, vowel identity — while retaining the temporal structure and pitch of the second. Try "Speech (counting)" with "Rain/texture" for the classic talking-rain effect, or swap magnitude and phase to hear rain that whispers words.

Top: magnitude source scalogram · Middle: hybrid |W_A| × phase(W_B) · Bottom: phase source scalogram — three methods: STFT (raw magnitude swap), Envelope (cepstral smoothing for cleaner formant transfer), CWT (Kronland-Martinet's original wavelet-domain approach)

7bWavelet Denoising

One of the most powerful practical applications of the wavelet transform: separating signal from noise by selectively suppressing small wavelet coefficients, while leaving large ones untouched.

Broadband noise appears in the wavelet domain as low-magnitude coefficients spread uniformly across all scales and times. Musical signals concentrate their energy in a sparse set of large coefficients. Soft thresholding — zeroing coefficients below a threshold λ and shrinking the rest toward zero — exploits this sparsity to denoise without introducing musical noise artefacts.

$$W_{\text{thresh}}(a,b) = \operatorname{sign}\bigl(W(a,b)\bigr) \cdot \max\bigl(0,\; |W(a,b)| - \lambda\bigr)$$

Donoho and Johnstone (1994) proved that for Gaussian noise with standard deviation σ, the optimal threshold is $\lambda = \sigma\sqrt{2 \log N}$. In practice, the wavelet denoising threshold is estimated from the noise level in the finest-scale coefficients, which are dominated by noise.

Interactive: Denoise a noisy musical signal

Top: clean signal · Middle: noisy signal · Bottom: denoised — drag threshold to see how much noise is removed vs signal distortion

8Legacy and Continuing Impact

The Marseille group's work rippled outward in every direction — from pure mathematics to MP3 compression, from medical imaging to gravitational wave detection.

Kronland-Martinet continues as Director of Research at the PRISM laboratory (CNRS / Aix-Marseille University), where his group now works on sound perception, cognitive aspects of timbre, and physically-informed synthesis models. The thread from 1985 continues unbroken: wavelets revealed the structure of sound; the question now is how that structure maps to human perception and cognition.

Alex Grossmann (1930–2019) lived to see wavelets become one of the most widely used tools in applied mathematics. Born in Zagreb, a refugee from wartime Croatia, he found his way through Princeton, Brandeis, and the Courant Institute before settling in Marseille, where his collaboration with Morlet would change signal processing forever.

After Marseille — where wavelets went

The Marseille papers were the seed. What grew from them, in just a decade, reshaped applied mathematics and pushed wavelets into almost every corner of signal processing:

1988
Ingrid Daubechies constructs the first family of orthonormal compactly supported wavelets (Comm. Pure Appl. Math. 41). The continuous transform of Marseille acquires a discrete, critically-sampled cousin with perfect reconstruction — the DWT.
1989
Stéphane Mallat introduces multiresolution analysis and the fast pyramidal DWT algorithm (IEEE PAMI 11), recasting the wavelet transform as a two-channel quadrature-mirror filter bank. The DWT becomes computable in $O(N)$ — faster than the FFT.
1990
Yves Meyer, Albert Cohen and Daubechies develop biorthogonal wavelets and characterise smooth orthonormal bases. Meyer's 1992 monograph Wavelets and Operators cements the field; he wins the Abel Prize in 2017 for this body of work.
1993
The FBI adopts a wavelet scalar quantisation standard (WSQ) for its 200 million-print fingerprint archive — 15:1 compression with minuscule loss of forensic detail.
2000
JPEG 2000 standardises image compression around a biorthogonal Cohen–Daubechies–Feauveau 9/7 wavelet. Wavelets become the default for high-end imaging (cinema, medical, satellite).
2012
Mallat's wavelet scattering transform (Comm. Pure Appl. Math. 65) — a hand-designed deep network built from wavelet moduli — shows that translation-invariant, stable representations can be engineered rather than learned.
2015–16
LIGO uses wavelet-based transient detection (Coherent WaveBurst, Omega/Q-transform glitch classification) alongside matched filtering in the first direct detection of gravitational waves. Morlet's analysis tool listens to colliding black holes.
Today
Wavelets live inside audio codecs, ECG and EEG analysis, seismic imaging, high-dynamic-range video, neural scattering networks, turbulence simulation, and Gaia/Euclid astronomical pipelines. The Continuous Wavelet Transform remains the tool of choice wherever non-stationary signals and musical time–frequency structure meet.

Related guides

The story told here is one chapter in a larger family. Four close cousins cover the rest of the audio time–frequency landscape:

The key publications from the Marseille collaboration and its direct descendants remain foundational reading:

Key References

  1. Grossmann, A. & Morlet, J. (1984). Decomposition of Hardy Functions into Square Integrable Wavelets of Constant Shape. SIAM J. Math. Anal. 15, 723–736.
  2. Goupillaud, P., Grossmann, A. & Morlet, J. (1984). Cycle-Octave and Related Transforms in Seismic Signal Analysis. Geoexploration, 23, 85–102.
  3. Kronland-Martinet, R., Morlet, J. & Grossmann, A. (1987). Analysis of Sound Patterns through Wavelet Transforms. Int. J. Pattern Recogn. Artif. Intell., 1(2), 273–302.
  4. Grossmann, A., Holschneider, M., Kronland-Martinet, R. & Morlet, J. (1987). Detection of Abrupt Changes in Sound Signals with the Help of Wavelet Transforms. Advances in Electronics and Electron Physics, Suppl. 19.
  5. Kronland-Martinet, R. (1988). The Use of the Wavelet Transform for the Analysis, Synthesis and Processing of Speech and Music Sounds. Computer Music Journal, MIT Press, 12(4), 11–20.
  6. Grossmann, A., Kronland-Martinet, R. & Morlet, J. (1989). Reading and Understanding Continuous Wavelet Transforms. In Combes, Grossmann & Tchamitchian (Eds.), Wavelets, Springer.
  7. Kronland-Martinet, R. & Grossmann, A. (1990). Application of Time-Frequency and Time-Scale Methods to the Analysis, Synthesis and Transformation of Natural Sounds. In Roads, De Poli & Picciali (Eds.), Representations of Musical Signals, MIT Press.
  8. Daubechies, I. (1988). Orthonormal Bases of Compactly Supported Wavelets. Comm. Pure Appl. Math., 41(7), 909–996.
  9. Mallat, S. (1989). A Theory for Multiresolution Signal Decomposition: The Wavelet Representation. IEEE Trans. Pattern Anal. Mach. Intell., 11(7), 674–693.
  10. Donoho, D. L. & Johnstone, I. M. (1994). Ideal Spatial Adaptation by Wavelet Shrinkage. Biometrika, 81(3), 425–455.
  11. Mallat, S. (2012). Group Invariant Scattering. Comm. Pure Appl. Math., 65(10), 1331–1398.
  12. Klimenko, S. et al. (2016). Method for Detection and Reconstruction of Gravitational Wave Transients with Networks of Advanced Detectors. Phys. Rev. D, 93(4), 042004 — the Coherent WaveBurst wavelet-based pipeline used in LIGO's first detections.