HQ-VAE: Hierarchical Discrete Representation Learning with Variational Bayes

TMLR Paper963 Authors

18 Mar 2023 (modified: 28 Jun 2023)Rejected by TMLREveryoneRevisionsBibTeX
Abstract: Vector quantization (VQ) is a technique to deterministically learn features with discrete codebook representations. It is commonly achieved with a variational autoencoding model, VQ-VAE, which is further extended to hierarchical structures for high-fidelity reconstruction. However, hierarchical extensions of VQ-VAE often suffer from codebook/layer collapse issue, where the codebook is not efficiently used to express data well, hence deteriorates reconstruction accuracy. To mitigate this problem of the extensions, we propose a novel unified framework to stochastically learn hierarchical discrete representation on the basis of the variational Bayes framework, called hierarchically quantized variational autoencoder (HQ-VAE). HQ-VAE naturally generalizes the hierarchical variants of VQ-VAE such as VQ-VAE-2 and residual-quantized VAE (RQ-VAE) and provides them with a Bayesian training scheme. Our comprehensive experiments on image datasets show that HQ-VAE enhances codebook usage and improves reconstruction performance. We also validate HQ-VAE in terms of its applicability even to a different modality with an audio dataset.
Submission Length: Regular submission (no more than 12 pages of main content)
Changes Since Last Submission: Please refer to our replies to the reviewers for the details. [nVD1]: Abstract, Clarify the meaning of "stable training" [nVD1]: Table 1, Perplexity [nVD1]: Figures 2 and 3, Error bars [nVD1]: Section 5.1, Reconstruction metrics, definition of perplexity [nVD1]: Section 5.2, MUSHRA experiment [nVD1]: Section 5.3, Empirical comparison of SQ-VAE-2 and RSQ-VAE [P5Kq]: Section 1, Methodological contributions [P5Kq]: Section 2.2, Comparison of SQ-VAE with dVAE [P5Kq]: Section 5, Remark regarding RD curves [hz6B]: Section 6.2, Benefits of hierarchical model that can reconstruct well [hz6B]: Appendix D: Numerical results on image generation [nVD1,P5Kq,hz6B]: Section 6.2, Concluding remarks, potential application of HQ-VAEs
Assigned Action Editor: ~antonio_vergari2
Submission Number: 963
Loading