QJL is 1-bit Compressive Sensing: An Equivalence and Its Consequences for KV Cache Compression in LLMs
Keywords: KV cache compression, 1-bit compressive sensing, QJL, Johnson-Lindenstrauss, rate-distortion, effective rank, TurboQuant, LLM inference
TL;DR: QJL in TurboQuant is equivalent to 1-bit compressive sensing; this transfers CS reconstruction bounds, a matching lower bound, and a rate-distortion theorem to KV cache compression, validated with 53-74% NMSE gains on real LLMs.
Abstract: We establish a formal equivalence between the
Quantized Johnson–Lindenstrauss (QJL) trans-
form of the TurboQuant KV cache compression
scheme and the classical 1-bit compressive sens-
ing (1-bit CS) model of Boufounos and Bara-
niuk (2008), which lets us import 1-bit CS theory
into QJL analysis. From it we derive three new
consequences. First, reconstruction guarantees
for QJL side-channel estimates in terms of mea-
surement count, dimension, and key geometry,
with a matching m ≍log(n)/γ2
n lower bound
via Le Cam/Fano (isotropic-keys model). Sec-
ond, an analysis of TurboQuant as a two-stage
operator—rotated scalar quantization composed
with QJL—yielding a composition error iden-
tity and a bit-allocation law that explains its de-
ployed configuration. Third, a rate–distortion
lower bound identifying the effective rank of
the residual covariance as the diagnostic gov-
erning multi-bit residual coding. Empirically,
KL transform coding cuts residual-reconstruction
NMSE by 53–74% over scalar quantization on
concentrated-spectrum residuals, and a QJL 1-
bit correction stacked on a learned low-rank pro-
jection adds ≤0.4 perplexity points across six
LLMs—confirming the composition bound end-
to-end.
Email Sharing: We authorize the sharing of all author emails with Program Chairs.
Data Release: We authorize the release of our submission and author names to the public in the event of acceptance.
Submission Number: 25
Loading