Keywords: Vector Quantization, VQ-VAE, Discrete Representation Learning, Generative model
Abstract: Transformer-based generative models have been widely used for generating high-quality images and other continuous data modalities. Despite their widespread adoption, these models frequently exhibit limitations in creativity, often failing to produce diverse and novel outputs.
Most existing studies analysing these shortcomings have predominantly concentrated on enhancing the generative architecture or training methodologies. In contrast, our study shifts the focus to the tokenization process, exploring how discretizing continuous representations into discrete tokens influences the overall creativity of generative models. Through systematic analysis, we identify a critical phenomenon we term "token representation shrinkage," characterized by the collapse of representation diversity within discrete codebook tokens and their continuous latent embeddings in vector quantization, which is one of the most popular discrete tokenization method used. Our findings reveal that this shrinkage problem significantly reduces the creativity of generative models, adversely affecting performance across various domains, including natural images and real-world medical images.
Supplementary Material: zip
Primary Area: Deep learning (e.g., architectures, generative models, optimization for deep networks, foundation models, LLMs)
Submission Number: 16091
Loading