Mutual Information Collapse Explains Disentanglement Failure in $\beta$-VAEs

Mutual Information Collapse Explains Disentanglement Failure in $\beta$-VAEs

27 Apr 2026 (modified: 04 May 2026)Under review for TMLREveryoneRevisionsBibTeXCC BY 4.0

Abstract: The $\beta$-VAE is a foundational framework for unsupervised disentanglement, utilizing a regularization parameter $\beta$ to balance latent factorization against reconstruction fidelity. However, disentanglement performance often exhibits a non-monotonic dependence on $\beta$: standard metrics, such as MIG and SAP, typically peak at intermediate values and deteriorate under stronger regularization. We characterize this phenomenon as informational collapse---an information-theoretic failure in which excessive regularization drives the mutual information between latent variables and ground-truth generative factors toward zero. By analyzing the stationarity conditions in a linear-Gaussian setting, we prove that for $\beta > 1$, alternating optimization induces a spectral contraction of the encoder gain. This leads to an exponential decay of its spectral norm and the subsequent vanishing of latent--factor mutual information. To mitigate this failure mode, we investigate the $\lambda\beta$-VAE, which augments the objective with an auxiliary $L_2$ reconstruction penalty. Our analysis demonstrates that this term modifies the encoder stationarity conditions to counteract spectral decay, thereby stabilizing information flow within the latent representation. Extensive experiments on dSprites, Shapes3D, and MPI3D-real confirm that $\lambda > 0$ enhances the stability of disentanglement and preserves latent informativeness across a significantly broader range of $\beta$, providing a principled justification for dual-parameter regularization in variational inference.

Submission Type: Regular submission (no more than 12 pages of main content)

Assigned Action Editor: ~Michael_Minyi_Zhang1

Submission Number: 8636

Loading