Learning Generative Population Models From Multiple Clinical Datasets Via Probabilistic Programming

Published: 17 Jun 2024, Last Modified: 17 Jun 2024AccMLBio PosterEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Probabilistic programming, Bayesian inference, oncology, structure learning
TL;DR: We learn probabilistic program models of clinical populations by combining data from multiple clinical datasets, and show the models are accurate and can be queried efficiently.
Abstract: Accurate, efficient generative models of clinical populations could accelerate clinical research and improve patient outcomes. For example, such models could infer probable treatment outcomes for different subpopulations, generate high-fidelity synthetic data that can be shared across organizational boundaries, and discover new relationships among clinical variables. Using Bayesian structure learning, we show that it is possible to learn probabilistic program models of clinical populations by combining data from multiple, sparsely overlapping clinical datasets. Through experiments with multiple clinical trials and real-world evidence from census health surveys, we show that our model generates higher quality synthetic data than neural network baselines, supports more accurate inferences across datasets than traditional statistical methods, and can be queried more efficiently than both, opening up new avenues for accessible and efficient AI assistance in clinical research.
Submission Number: 34
Loading