A visual, interactive guide to the frequency transform that hears music the way we do — with logarithmic resolution matching the structure of pitch.
"Constant Q" means the ratio of each frequency bin's centre frequency to its bandwidth is constant. Low notes get wide analysis windows (good frequency resolution); high notes get narrow windows (good time resolution).
This mirrors how musical pitch works. On a piano, the distance from C3 to C4 spans the same perceptual interval as C5 to C6, even though the latter covers four times the Hz range. The CQT spaces its bins logarithmically to match, typically placing 12, 24, 36, or more bins per octave.
The FFT uses linearly-spaced frequency bins. Musical notes are logarithmically spaced. The mismatch makes pitch analysis with the FFT awkward — the CQT solves this directly.
Consider a signal containing notes at C3, E4, and G5. In the FFT, these three notes are unevenly distributed across the spectrum, and the low notes are smeared across very few bins. In the CQT, each note occupies proportionally the same width.
The CQT maps frequency to a logarithmic axis where octaves are equally spaced. This aligns perfectly with the Western chromatic scale — and with human pitch perception.
Each CQT bin uses a different-length analysis window. Low-frequency bins use long windows for precise pitch resolution; high-frequency bins use short windows for precise timing.
The window length for bin k is Nk = Q · fs / fk. This is the heart of the constant-Q property — the window always contains the same number of oscillation cycles, regardless of frequency.
Compute the window length Nk for each bin from Q and fk
Generate a windowed complex exponential at fk
Multiply the signal by this kernel and sum — that's one CQT coefficient
Repeat for all K bins across all time frames
The naive CQT is expensive — O(KN) per frame. Efficient algorithms use the FFT as a backbone, either via spectral kernels or recursive downsampling.
Where wk(j) is the window function of length Nk. The key insight is that each bin sees a different number of signal samples, unlike the FFT where every bin uses the same window.
Fold the CQT's log-frequency bins into a single octave and you get a chromagram — a 12-dimensional representation of harmonic content, perfect for chord recognition and key detection.
If the CQT has B=36 bins per octave across 5 octaves (180 bins total), the chromagram sums every 3rd bin (for 12 chroma classes) across all octaves, collapsing the 180 bins into 12.
The CQT is the backbone of modern music information retrieval, audio synthesis, and intelligent audio processing.
Pitch tracking — automatic transcription, melody extraction, tuner apps
Chord recognition — real-time chord detection via CQT → chromagram pipeline
Audio synthesis — phase vocoders with perceptually uniform frequency resolution
Deep learning — CQT spectrograms as input features for music classification, source separation, and generative models