On the Value of Infinite Gradients in Variational Autoencoder Models

Bin Dai; Li Kevin Wenliang; David Wipf

On the Value of Infinite Gradients in Variational Autoencoder Models

Bin Dai, Li Kevin Wenliang, David Wipf

Published: 09 Nov 2021, Last Modified: 05 May 2023NeurIPS 2021 SpotlightReaders: Everyone

Keywords: variational autoencoders, sparse representations, latent variable models

TL;DR: We demonstrate that infinite gradients, although perhaps at times difficult to address in practical, can serve a useful role in pruning the latent space of autoencoder-based models.

Abstract: A number of recent studies of continuous variational autoencoder (VAE) models have noted, either directly or indirectly, the tendency of various parameter gradients to drift towards infinity during training. Because such gradients could potentially contribute to numerical instabilities, and are often framed as a problematic phenomena to be avoided, it may be tempting to shift to alternative energy functions that guarantee bounded gradients. But it remains an open question: What might the unintended consequences of such a restriction be? To address this issue, we examine how unbounded gradients relate to the regularization of a broad class of autoencoder-based architectures, including VAE models, as applied to data lying on or near a low-dimensional manifold (e.g., natural images). Our main finding is that, if the ultimate goal is to simultaneously avoid over-regularization (high reconstruction errors, sometimes referred to as posterior collapse) and under-regularization (excessive latent dimensions are not pruned from the model), then an autoencoder-based energy function with infinite gradients around optimal representations is provably required per a certain technical sense which we carefully detail. Given that both over- and under-regularization can directly lead to poor generated sample quality or suboptimal feature selection, this result suggests that heuristic modifications to or constraints on the VAE energy function may at times be ill-advised, and large gradients should be accommodated to the extent possible.

Code Of Conduct: I certify that all co-authors of this work have read and commit to adhering to the NeurIPS Statement on Ethics, Fairness, Inclusivity, and Code of Conduct.

Supplementary Material: pdf

20 Replies

Loading