Keywords: vector quantization, representation collapse
TL;DR: We identify dimensional collapse in VQVAEs, where the codebook's effective dimensionality is surprisingly low, and investigate its implications and potential remedies.
Abstract: Vector-Quantized Variational Autoencoders (VQVAEs) have enabled strong performance in generative modeling by mapping continuous data to learnable codes.
In this work, we identify a surprising yet consistent phenomenon that we term \emph{dimensional collapse}: despite using high-dimensional embeddings, VQVAEs tend to compress their representations into a much smaller subspace, typically only 4 to 10 dimensions.
We provide an in-depth analysis of this phenomenon and reveal its relation to model performance and learning dynamics.
Interestingly, VQVAEs naturally gravitate toward this low-dimensional regime, and enforcing higher-dimensional usage (e.g., via rank regularization) could lead to degraded performance.
To overcome this low-dimensionality limitation, we propose \textbf{Divide-and-Conquer VQ (DCVQ)}, which partitions the latent space into multiple low-dimensional subspaces, each quantized independently.
By design, each subspace respects the model’s preference for low dimensionality, while their combination expands the overall capacity.
Our results show that DCVQ overcomes the inherent dimensional bottleneck and achieves improved reconstruction quality across image datasets.
Supplementary Material: zip
Primary Area: Deep learning (e.g., architectures, generative models, optimization for deep networks, foundation models, LLMs)
Submission Number: 17837
Loading