Keywords: neural audio codec, audio compression, learned compression, scalar quantization, autoregressive latent model, entropy modeling
Abstract: In this work, we study high-fidelity stereo audio compression without using residual vector quantization (RVQ). We present preliminary findings of our neural audio codec approach, capable of compressing general (speech, music, and environment) stereo audio at 44.1 kHz to 13 kbps with minimal loss in audio fidelity. We achieve this compression by using scalar quantization (SQ) in combination with an autoregressive latent model (ARM), enabling efficient entropy modeling. This approach circumvents the pitfalls of widely-used RVQ approaches, and to the best of our knowledge is the first application of SQ with ARM to the general audio compression domain.
Submission Number: 7
Loading