Keywords: Vector Quantization, VQ-VAE
TL;DR: Training-free converion of a Gaussian VAE into VQ-VAE can be achieved with provably similar rate distortion performance.
Abstract: Vector quantized variational autoencoder (VQ-VAE) is a discrete auto-encoder that compresses images into discrete tokens. It is difficult to train due to discretization. In this paper, we propose a simple yet effective technique, dubbed __Gaussian Quant (GQ)__, that converts a Gaussian VAE into a VQ-VAE without any additional training. GQ generates random Gaussian noise as a codebook and finds the closest noise vector to the posterior mean. Theoretically, we prove that when the logarithm of the codebook size exceeds the bits-back coding rate of the Gaussian VAE, a small quantization error is guaranteed. Practically, we propose a heuristic to train Gaussian VAE for effective GQ, named target divergence constraint (TDC). Empirically, we show that GQ outperforms previous VQ-VAE, such as VQ, FSQ, LFQ, and BSQ, on both UNet and ViT architectures. Furthermore, TDC also improves upon previous Gaussian VAE discretization methods, such as TokenBridge. The source code is provided in supplementary materials.
Supplementary Material: zip
Primary Area: generative models
Submission Number: 6836
Loading