Abstract: Many modern statistical analysis and machine learning applications require training models on sensitive user data. Under a formal definition of privacy protection, differentially private algorithms inject calibrated noise into the confidential data or during the data analysis process to produce privacy-protected datasets or queries. However, restricting access to only privatized data during statistical analysis makes it computationally challenging to make valid statistical inferences. In this work, we propose simulation-based inference methods from privacy-protected datasets. In addition to sequential Monte Carlo approximate Bayesian computation, we adopt neural conditional density estimators as a flexible family of distributions to approximate the posterior distribution of model parameters given the observed private query results. We illustrate our methods on discrete time-series data under an infectious disease model and with ordinary linear regression models. Illustrating the privacy-utility trade-off, our experiments and analysis demonstrate the necessity and feasibility of designing valid statistical inference procedures to correct for biases introduced by the privacy-protection mechanisms.
Submission Length: Regular submission (no more than 12 pages of main content)
Changes Since Last Submission: We summarize major changes below:
1. We have updated the title to "Simulation-based Bayesian Inference from Privacy Protected Data" to better reflect the research focus of this work.
2. We reorganized Section 2 (Backgrounds), adding a review of ABC and neural density estimation methods, as well as some discussions on summary statistics.
3. We have added a new Section 3.1 to provide a detailed description of how SMC-ABC is adapted to the private data setting, and included pseudocode in Appendix C.
4. We discussed the computational complexity of the methods, with details in Appendix D.
5. In the experiments, we have extended Appendix E with additional results under different privacy loss budget $\epsilon=1, 0.1$ and analyzed how the number of layers in the flow-based model affects inference performance.
Code: https://github.com/Yifei-Xiong/Simulation-based-Bayesian-Inference-from-Privacy-Protected-Data
Supplementary Material: zip
Assigned Action Editor: ~Antti_Honkela1
Submission Number: 3577
Loading