Diffusion Model-Based Data Augmentation for Lung Ultrasound Classification with Limited Data

Xiaohui Zhang, Ahana Gangopadhyay, Hsi-Ming Chang, Ravi Soni

Published: 01 Jan 2023, Last Modified: 07 Oct 2024ML4H@NeurIPS 2023EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: Deep learning models typically require large quantities of data for good generalization. However, acquiring labeled medical imaging data is expensive, particularly for rare pathologies. While standard data augmentation is routinely performed to improve data variety, it may not be sufficient to improve the performance of downstream tasks with a clinical diagnostic purpose. Here we investigate the applicability of SinDDM kulikov2023sinddm , a single-image denoising diffusion model, for medical image data augmentation with lung ultrasound (LUS) images. Qualitative and quantitative evaluation of perceptual quality of the generated images were conducted. A multi-class classification task to detect various pathologies from LUS images was also employed to demonstrate the effectiveness of synthetic data augmentation using SinDDM. We further evaluated the image generation performance of FewDDM, an extended version of SinDDM trained on a limited number of images instead of a single image. Our results show that both SinDDM and FewDDM are able to generate images superior in quality compared to single-image generative adversarial networks (GANs), and are also highly effective in augmenting medical imaging data with limited number of samples to improve downstream task performance.