Keywords: synthetic data, clinical trials, machine learning, privacy
TL;DR: A method to generate a synthetic copy of a clinical trial data while preserving fidelity of the data and privacy of the subjects.
Abstract: Clinical trials capture high-quality data for millions of patients each year, yet these data are largely unavailable for research beyond the scope of any individual trial due to a combination of regulatory, intellectual property, and patient privacy barriers. Synthetic clinical trial data that captures the analytical properties of the source data, could provide significant value for research and drug development by making insights widely available while protecting the privacy of the participants. We present a method for generating research-grade synthetic clinical trial data from a real data source. We compared the fidelity and privacy preservation performance of our method to the state-of-the-art deep learning synthesizers and found that our synthesizer had superior performance when applied to clinical trial data as assessed both by established metrics and when considering critical clinical features. We also demonstrate how the privacy settings may be configured to conform to specific privacy policies governing data sharing.