TL;DR: Variational inference for survival models transcends the standard paradim, requiring adaptions to handle censored data.
Abstract: This paper provides a comprehensive analysis of variational inference in latent variable models for survival analysis, emphasizing the distinctive challenges associated with applying variational methods to survival data. We identify a critical weakness in the existing methodology, demonstrating how a poorly designed variational distribution may hinder the objective of survival analysis tasks—modeling time-to-event distributions. We prove that the optimal variational distribution, which perfectly bounds the log-likelihood, may depend on the censoring mechanism. To address this issue, we propose censor-dependent variational inference (CDVI), tailored for latent variable models in survival analysis. More practically, we introduce CD-CVAE, a V-structure Variational Autoencoder (VAE) designed for the scalable implementation of CDVI. Further discussion extends some existing theories and training techniques to survival analysis. Extensive experiments validate our analysis and demonstrate significant improvements in the estimation of individual survival distributions.
Lay Summary: Unlike supervised learning, where full labels are available, and unsupervised learning, which uses no labels, censored data provides only partial outcome information, for example, knowing a patient survived up to a certain time without knowing the exact time of death.
Does the standard inference paradigm used in latent probabilistic models fall short in handling such data? If so, what changes are needed to address this challenge? Our paper gives a clear and well-supported “yes” and shows that the key is teaching the model to recognize whether each outcome is fully observed—a concept we call a censor-dependent structure in the inference process. This idea allows many existing theories and algorithms to be extended beyond fully labeled settings to better handle censored data, and we explore several of these extensions in the paper. To support practical use, we release a free, easy-to-use algorithm called CD-CVAE, which achieves state-of-the-art performance.
Our findings provide guidance for building scalable probabilistic models that treat censored datasets with care.
Link To Code: https://github.com/ChuanhuiLiu/CDVI
Primary Area: Probabilistic Methods->Variational Inference
Keywords: survival analysis, variational inference, variational autoencoders, time-to-event modeling
Submission Number: 12192
Loading