Abstract: Vector quantization (VQ) is a technique to deterministically learn features with discrete codebook representations. It is commonly achieved with a variational autoencoding model, VQ-VAE, which is further extended to hierarchical structures for high-fidelity reconstruction. However, hierarchical extensions of VQ-VAE often suffer from codebook/layer collapse issue, where the codebook is not efficiently used to express data well, hence deteriorates reconstruction accuracy. To mitigate this problem of the extensions, we propose a novel unified framework to stochastically learn hierarchical discrete representation on the basis of the variational Bayes framework, called hierarchically quantized variational autoencoder (HQ-VAE). HQ-VAE naturally generalizes the hierarchical variants of VQ-VAE such as VQ-VAE-2 and residual-quantized VAE (RQ-VAE) and provides them with a Bayesian training scheme. Our comprehensive experiments on image datasets show that HQ-VAE enhances codebook usage and improves reconstruction performance. We also validate HQ-VAE in terms of its applicability even to a different modality with an audio dataset.
Submission Length: Regular submission (no more than 12 pages of main content)
Changes Since Last Submission: Please refer to our replies to the reviewers for the details.
[nVD1]: Abstract, Clarify the meaning of "stable training"
[nVD1]: Table 1, Perplexity
[nVD1]: Figures 2 and 3, Error bars
[nVD1]: Section 5.1, Reconstruction metrics, definition of perplexity
[nVD1]: Section 5.2, MUSHRA experiment
[nVD1]: Section 5.3, Empirical comparison of SQ-VAE-2 and RSQ-VAE
[P5Kq]: Section 1, Methodological contributions
[P5Kq]: Section 2.2, Comparison of SQ-VAE with dVAE
[P5Kq]: Section 5, Remark regarding RD curves
[hz6B]: Section 6.2, Benefits of hierarchical model that can reconstruct well
[hz6B]: Appendix D: Numerical results on image generation
[nVD1,P5Kq,hz6B]: Section 6.2, Concluding remarks, potential application of HQ-VAEs
Assigned Action Editor: ~antonio_vergari2
Submission Number: 963
Loading