Track: Traditional track
Keywords: Synthetic data, distribution shift, generative models, differential privacy
Abstract: Effective collaboration across medical institutions presents a significant challenge, primarily due to the imperative of maintaining patient privacy. Optimal machine learning models in healthcare demand access to extensive, high-quality data to achieve generality and robustness. Yet, typically, medical institutions are restricted to data within their networks, limiting the scope and diversity of information. This limitation is especially pronounced in the case of patients with rare or unique characteristics, resulting in decreased accuracy for this minority group. To address these challenges, our work introduces a framework designed to enhance existing clinical foundation models, Private Synthetic Hypercube Augmentation (PriSHA). We leverage generative models to produce synthetic data, generated from diverse sources, as a means to augment these models while adhering to strict privacy standards. This approach promises to broaden the dataset's scope and improve model performance without compromising patient confidentiality. To our knowledge, our framework is the first synthetic data augmentation framework that merges privacy-preserving tabular data and real data from multiple sources.
Presentation And Attendance Policy: I have read and agree with the symposium's policy on behalf of myself and my co-authors.
Ethics Board Approval: Yes, we have/will include(d) information about IRB approval or its equivalent, in the manuscript.
Data And Code Availability: No, we will not be making any data and/or code public.
Primary Area: Challenges limiting the adoption of modern ML in healthcare
Student First Author: Yes, the primary author of the manuscript is a student.
Submission Number: 17
Loading