Unpicking Data at the Seams: Understanding Disentanglement in VAEs

19 Sept 2025 (modified: 11 Feb 2026)Submitted to ICLR 2026EveryoneRevisionsBibTeXCC BY 4.0
Keywords: disentanglement, probabilistic model, generative model, VAE, theory
TL;DR: Disentanglement = factorising the data distribution into independent factors aligned with the latent axes. Diagonal posteriors in a VAE induce the constraints necessary to achieve that.
Abstract: We give a precise, general account of *disentanglement* for smooth generative models. For a decoder $g:\\mathcal{Z}\\to\\mathcal{X}$ and factorised prior $p(z=\\prod_i p_i(z_i)$, we (i) define disentanglement as *factorisation of the pushforward density* $p_\\mu= g_\\#p$ into one–dimensional ``seam'' factors (Def. D1); (ii) prove a canonical factorisation of $p_\\mu$; and (iii) show that disentanglement is *equivalent* to two decoder conditions (C1-C2). Furthermore, under these conditions, the seam factors are *identifiable* up to permutation and sign. These results hold for general smooth pushforwards and are independent of VAEs. Specializing to Gaussian VAEs, we use an *exact* identity to show that diagonal posteriors (and $\\beta$) promote C1--C2 in expectation, thereby explaining when and why VAEs exhibit disentanglement and how $\\beta$ modulates it. Experiments illustrate this mechanism on Gaussian data, dSprites, and CelebA.
Primary Area: probabilistic methods (Bayesian methods, variational inference, sampling, UQ, etc.)
Submission Number: 22084
Loading