On Consistent Bayesian Inference from Synthetic Data

10 May 2023 (modified: 12 Dec 2023)Submitted to NeurIPS 2023EveryoneRevisionsBibTeX
Keywords: synthetic data, Bayesian inference, Bernstein-von Mises theorem, differential privacy
TL;DR: We prove that consistent Bayesian inference from synthetic data reusing existing samplers is possible with multiple large synthetic datasets.
Abstract: Generating synthetic data, with or without differential privacy, has attracted significant attention as a potential solution to the dilemma between making data easily available, and the privacy of data subjects. Several works have shown that consistency of downstream analyses from synthetic data, including accurate uncertainty estimation, requires accounting for the synthetic data generation. There are very few methods of doing so, most of them for frequentist analysis. In this paper, we study how to perform consistent Bayesian inference from synthetic data. We prove that mixing posterior samples obtained separately from multiple large synthetic datasets converges to the posterior of the downstream analysis under standard regularity conditions when the analyst's model is compatible with the data provider's model. We show experimentally that this works in practice, unlocking consistent Bayesian inference from synthetic data while reusing existing downstream analysis methods.
Supplementary Material: zip
Submission Number: 6883
Loading

OpenReview is a long-term project to advance science through improved peer review with legal nonprofit status. We gratefully acknowledge the support of the OpenReview Sponsors. © 2025 OpenReview