Keywords: Disentanglement, Non-Linear Independent Component Analysis (NICA), Maximum Mean Discrepancy (MMD), Autoencoder, Representation Learning
TL;DR: Utilizing MMD loss to regularize the aggregate posterior of the latent space enables learning features with tailored and mutually independent distributions leading to improved disentangled representation.
Abstract: Learning disentangled representations, where semantic features are captured by independent variables, is dominated by the Variational Autoencoder (VAE) which uses the Kullback-Leibler (KL) penalty to learn a factorized representation in the latent space. In this paper, we provide direct visual and quantitative evidence that the VAE-based methods consistently fail to enforce this target distribution on the aggregate posterior, subsequently falling short of a mutually independent representation -- the training objective of unsupervised disentanglement. We quantify this failure and resulting entanglement using a stable, unsupervised Latent Predictability Score (LPS). To address this, we propose the Programmable Prior Framework: a non-parametric method built on the Maximum Mean Discrepancy (MMD). We verify our framework allows practitioners to explicitly sculpt the latent space, achieving (1) state-of-the-art unsupervised statistical independence (measured by LPS), (2) alignment to semantic features using an internal semi-supervised mechanism, and (3) aggregate posterior distribution shaping (validated through quantization-aware training), all without reconstruction trade-offs. Ultimately, the framework is one of a kind in that it provides a reliable foundational tool for balancing these three key training objectives, opening new avenues for model identifiability, interpretability, causal reasoning, and efficient compression.
Supplementary Material: zip
Primary Area: unsupervised, self-supervised, semi-supervised, and supervised representation learning
Submission Number: 9937
Loading