Keywords: Discrete representation learning, Vector quantization, Rate-adaptive quantization, Variable-rate compression
TL;DR: RAQ introduces rate-adaptive codebook adaptation for vector-quantized models, extending discrete representation learning beyond fixed-rate limits.
Abstract: Learning discrete representations with vector quantization (VQ) has emerged as a powerful approach in representation learning across vision, audio, and language. However, most VQ models rely on a single, fixed-rate codebook, requiring extensive retraining for new bitrates or efficiency requirements. We introduce Rate-Adaptive Quantization (RAQ), a multi-rate codebook adaptation framework for VQ models. RAQ integrates a lightweight sequence-to-sequence (Seq2Seq) codebook generator with the base VQ model, enabling on-demand codebook adaptation to any target size at inference. Additionally, we provide a clustering-based post-hoc alternative for pre-trained VQ models, suitable when modifying the training pipeline or joint training is not feasible. Our experiments demonstrate that RAQ performs effectively across multiple rates and VQ models, often outperforming fixed-rate baselines. This model-agnostic adaptability enables a single system to meet varying bitrate requirements in reconstruction and generation tasks.
Supplementary Material: zip
Primary Area: unsupervised, self-supervised, semi-supervised, and supervised representation learning
Submission Number: 16719
Loading