A source data privacy framework for synthetic clinical trial dataDownload PDF

03 Oct 2022 (modified: 05 May 2023)Neurips 2022 SyntheticData4MLReaders: Everyone
Keywords: clinical trial data, synthetic data, clinical trials, privacy, differential privacy, framework
TL;DR: A privacy framework to enhance the overall privacy of synthetic clinical trial data by using technical, policy, and algorithmic controls
Abstract: Synthetic clinical trial data create opportunities for data sharing, cross-collaboration, and innovation for these valuable, siloed data sources. While the value of synthetic clinical trial data relies on the privacy preservation it offers the clinical trial participants, the true degree of privacy has been questioned in recent literature. Given the highly sensitive nature of clinical trial data, especially their content composing private health information, there is an urgent need for a framework specifically designed to provide guaranteed levels of privacy for synthetic datasets generated from clinical trial data. In this paper, we propose a practical privacy framework that ensures synthetic clinical trial data privacy at the level of the source data by design and provides objective, measurable bounds on the disclosure risks through a combination of technical, policy, and algorithmic controls. The proposed framework enforces privacy prior to the generation of synthetic datasets and therefore complements the privacy preserving attributes intrinsic to the algorithms used for synthetic data generation. To demonstrate how the components of the framework address the privacy requirements needed for clinical trial data, we discuss how this privacy system responds to a set of realistic adversarial scenarios. Ultimately, we believe the proposed framework can foster more privacy research in clinical trial data sharing.
4 Replies