Neural Audio Codec for Latent Music Representations

Luca A Lanzendörfer; Florian Grötschla; Amir Dellali; Roger Wattenhofer

Neural Audio Codec for Latent Music Representations

Luca A Lanzendörfer, Florian Grötschla, Amir Dellali, Roger Wattenhofer

Published: 10 Oct 2024, Last Modified: 29 Oct 2024Audio Imagination: NeurIPS 2024 WorkshopEveryoneRevisionsBibTeXCC BY 4.0

Keywords: neural audio codec, residual vector quantization

TL;DR: We introduce a high-fidelity neural audio codec for compressing 44.1kHz music into discrete or continuous latent representations.

Abstract: Neural audio codecs have become increasingly important for audio compression and, more recently, for creating tokenized representations for various generative downstream tasks. Consequently, the performance of neural audio codecs plays a crucial role in many applications. In this work, we introduce DisCodec, a high-fidelity neural audio codec for compressing 44.1kHz music into discrete or continuous latent representations. DisCodec leverages ConvNeXt and attention layers, an affine re-parametrization of the code vectors, and an improved commitment loss for better alignment between codebooks and model embeddings. We study comparisons of DisCodec against existing codecs, perform a comprehensive ablation of the proposed architecture, and demonstrate its performance against state-of-the-art neural audio codecs. We make the DisCodec codebase and model checkpoints available at https://github.com/ETH-DISCO/discodec.

Submission Number: 31

Loading