Abstract: Acquiring large quantities of data and annotations is effective
for developing high-performing deep learning models, but is difficult
and expensive to do in the healthcare context. Adding synthetic training
data using generative models offers a low-cost method to deal effectively
with the data scarcity challenge, and can also address data imbalance and
patient privacy issues. In this study, we propose a comprehensive framework
that fits seamlessly into model development workflows for medical
image analysis. We demonstrate, with datasets of varying size, (i) the
benefits of generative models as a data augmentation method; (ii) how
adversarial methods can protect patient privacy via data substitution;
(iii) novel performance metrics for these use cases by testing models on
real holdout data. We show that training with both synthetic and real
data outperforms training with real data alone, and that models trained
solely with synthetic data approach their real-only counterparts. Code is
available at https://github.com/Global-Health-Labs/US-DCGAN.
Loading