Neural Audio Compression without Residual Vector Quantization

Till Aczel; Luca A Lanzendörfer; Fei Gao; Roger Wattenhofer

Neural Audio Compression without Residual Vector Quantization

Till Aczel, Luca A Lanzendörfer, Fei Gao, Roger Wattenhofer

Published: 26 Jan 2026, Last Modified: 26 Jan 2026AAAI 2026 Workshop on ML4Wireless PosterEveryoneRevisionsBibTeXCC BY 4.0

Keywords: neural audio codec, audio compression, learned compression, scalar quantization, autoregressive latent model, entropy modeling

Abstract: In this work, we study high-fidelity stereo audio compression without using residual vector quantization (RVQ). We present preliminary findings of our neural audio codec approach, capable of compressing general (speech, music, and environment) stereo audio at 44.1 kHz to 13 kbps with minimal loss in audio fidelity. We achieve this compression by using scalar quantization (SQ) in combination with an autoregressive latent model (ARM), enabling efficient entropy modeling. This approach circumvents the pitfalls of widely-used RVQ approaches, and to the best of our knowledge is the first application of SQ with ARM to the general audio compression domain.

Submission Number: 7

Loading