Abstract: Codebook collapse is a common problem in training deep generative models with
discrete representation spaces like Vector Quantized Variational Autoencoders
(VQ-VAEs). We observe that the same problem arises for the alternatively
designed discrete variational autoencoders (dVAEs) whose encoder directly
learns a distribution over the codebook embeddings to represent the data. We
hypothesize that using the softmax function to obtain a probability distribution
causes the codebook collapse by assigning overconfident probabilities to the
best matching codebook elements. In this paper, we propose a novel way to
incorporate evidential deep learning (EDL) through a hierarchical Bayesian
modeling instead of softmax to combat the codebook collapse problem of dVAE.
We evidentially monitor the significance of attaining the probability distribution
over the codebook embeddings, in contrast to softmax usage. Our experiments
using various datasets show that our model, called EdVAE, mitigates codebook
collapse while improving the reconstruction performance, and enhances the
codebook usage compared to dVAE and VQ-VAE based models. Our code can
be found at https://github.com/ituvisionlab/EdVAE
Loading