Pareto Variational Autoencoder

Published: 26 Jan 2026, Last Modified: 06 May 2026ICLR 2026 PosterEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Variational Autoencoder, Symmetric Pareto distribution, Information geometry, Heavy-tail learning
Abstract: This paper introduces a new class of multivariate power-law distributions---the symmetric Pareto (symPareto) distribution---which can be viewed as an $\ell_1$-norm-based counterpart of the multivariate $t$ distribution, with the motivation of capturing the heavy tail of the target distribution in generative modeling and bringing robustness to noise in downstream tasks such as image denoising. The symPareto distribution possesses many attractive information-geometric properties with respect to the $\gamma$-power divergence that is a natural alternative to the Kullback-Leibler divergence, the core of the conventional variational autoencoder (VAE) models, for power families. Leveraging on the joint minimization view of variational inference, this paper proposes the ParetoVAE, a probabilistic autoencoder that minimizes the $\gamma$-power divergence between two statistical manifolds. ParetoVAE employs the symPareto distribution for both prior and encoder, with flexible decoder options including multivariate $t$ and symPareto distributions. Empirical evidences demonstrate the effectiveness of ParetoVAE across multiple domains through varying the types of the decoder. The $t$ decoder achieves superior performance in sparse, heavy-tailed data reconstruction and word frequency analysis; the symPareto decoder enables robust high-dimensional denoising.
Supplementary Material: zip
Primary Area: probabilistic methods (Bayesian methods, variational inference, sampling, UQ, etc.)
Submission Number: 19557
Loading