Abstract: Encoder-based generative models fundamentally rely on the structure of their latent space to achieve high-quality image reconstruction, generation, and semantic manipulation. In latent spaces, a multivariate Gaussian distribution is often desirable due to its closure under linear transformations. To approximate this, most existing methods impose a standard Gaussian prior via Kullback-Leibler (KL) divergence, which assumes independence among latent components. However, real-world latent representations typically exhibit strong internal correlations, rendering the independence assumption inadequate. In this work, we apply random projection theory to analyze how latent representations differ from a target multivariate Gaussian distribution. We prove that the normalized third absolute moment in low-dimensional subspaces effectively quantifies such deviations. Building on this result, we propose a regularization method that encourages the latent space to align with a multivariate Gaussian distribution without independence assumption across dimensions. The method is compatible with a wide range of encoder-based architectures and introduces no additional computational overhead. We validate the effectiveness of our method through extensive experiments across diverse models. The results consistently show improvements in generation quality, semantic editability, and alignment with the target latent distribution, demonstrating the practical value of the proposed regularization.
Submission Length: Regular submission (no more than 12 pages of main content)
Assigned Action Editor: ~C.V._Jawahar1
Submission Number: 5242
Loading