Doubly Robust Conditional VAE via Decoder Calibration: An Implicit KL Annealing Approach

Published: 20 Jan 2025, Last Modified: 20 Jan 2025Accepted by TMLREveryoneRevisionsBibTeXCC BY 4.0
Abstract: Several variants of Variational Autoencoders have been developed to address inherent limitations. Specifically, $\sigma$-VAE utilizes a scaled identity matrix $\sigma^2 I$ in the decoder variance, while $\beta$-VAE introduces a hyperparameter $\beta$ to reweight the negative ELBO loss. However, a unified theoretical and practical understanding of model optimality remains unclear. For example, existing learning theories on the global optimality of VAE provide limited insight into their empirical success. Previous work showed the mathematical equivalence between the variance scalar $\sigma^2$ and the hyperparameter $\beta$ in shaping the loss landscape. While $\beta$-annealing is widely used, how to implement $\sigma$-annealing is still unclear. This paper presents a comprehensive analysis of $\sigma$-CVAE, highlighting its enhanced expressiveness in parameterizing conditional densities while addressing the associated estimation challenges arising from suboptimal variational inference. In particular, we propose Calibrated Robust $\sigma$-CVAE, a doubly robust algorithm that facilitates accurate estimation of $\sigma$ while effectively preventing the posterior collapse of $\phi$. Our approach, leveraging functional neural decomposition and KL annealing techniques, provides a unified framework to understand both $\sigma$-VAE and $\beta$-VAE regarding parameter optimality and training dynamics. Experimental results on synthetic and real-world datasets demonstrate the superior performance of our method across various conditional density estimation tasks, highlighting its significance for accurate and reliable probabilistic modeling.
Submission Length: Regular submission (no more than 12 pages of main content)
Changes Since Last Submission: Apart from uploading the GitHub repository, the following changes were made in the camera-ready revision: 1. **Introduction**: Takeaway #1 was rephrased to emphasize its distinction from existing learning theories. 2. **Related Work**: A detailed review of approximation theories was provided in the first paragraph. 3. **Section 3.2**: The paragraphs after Theorem 3.4 were rephrased to further clarify the approximation theory and its practical connection to posterior collapse.
Code: https://github.com/chuanhuiliu/calibrated_cvae
Assigned Action Editor: ~Yingzhen_Li1
Submission Number: 3394
Loading