Abstract: While many variants of Variational Autoencoders proposed, a unified understanding remains unclear. In particular, $\sigma$-VAEs utilize a scaled identity matrix $\sigma^2 I$ in the decoder variance, while $\beta$-VAEs introduce a hyperparameter $\beta$ to reweight negative ELBO loss. However, existing learning theories on the global optimal VAEs yield limited practical insight toward their empirical success. In addition, previous work showed the mathematical equivalence of the variance scalar $\sigma$ and the hyperparameter $\beta$ in the loss landscape, but $\sigma$ as a model parameter fundamentally differs from $\beta$ as a hyperparameter. This paper presents a comprehensive analysis of $\sigma$-CVAE, revealing its expressiveness and limitations due to suboptimal variational inference. Focusing on the conditional variants, we propose Calibrated Robust $\sigma$-CVAE, a doubly robust algorithm that ensures reliable $\sigma$ estimation while effectively preventing posterior collapse. Our approach, leveraging functional neural decomposition and KL annealing techniques, provides a unified framework to understand both $\sigma$-VAEs and $\beta$-VAEs regarding parameter optimality and training dynamics. Empirical results demonstrate the superior performance of our method across various conditional density estimation tasks, highlighting its significance for accurate and reliable probabilistic modeling.
Submission Length: Regular submission (no more than 12 pages of main content)
Assigned Action Editor: ~Yingzhen_Li1
Submission Number: 3394
Loading