Informative Synthetic Data Generation for Thorax Disease Classification

Published: 07 May 2025, Last Modified: 13 Jun 2025UAI 2025 PosterEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Informative Data Selection, Generative Data Augmentation, Thorax Disease Classification
TL;DR: We propose a novel method for sample re-weighting, termed Informative Data Selection (IDS), which enhances thorax disease classification through training classifiers on synthetic data augmentation with proper importance weights.
Abstract: Deep Neural Networks (DNNs), including architectures such as Vision Transformers (ViTs), have achieved remarkable success in medical imaging tasks. However, their performance typically hinges on the availability of large-scale, high-quality labeled datasets—resources that are often scarce or infeasible to obtain in medical domains. Generative Data Augmentation (GDA) offers a promising remedy by supplementing training sets with synthetic data generated via generative models like Diffusion Models (DMs). Yet, this approach introduces a critical challenge: synthetic data often contains significant noise, which can degrade the performance of classifiers trained on such augmented datasets. Prior solutions, including data selection and re-weighting techniques, often rely on access to clean metadata or pretrained external classifiers. In this work, we propose \emph{Informative Data Selection} (IDS), a principled sample re-weighting framework grounded in the Information Bottleneck (IB) principle. IDS assigns higher weights to more informative synthetic samples, thereby improving classifier performance in GDA-enhanced training for thorax disease classification. Extensive experiments demonstrate that IDS significantly outperforms existing data selection and re-weighting baselines. Our code is publicly available at \url{https://github.com/Statistical-Deep-Learning/IDS}.
Latex Source Code: zip
Code Link: https://github.com/Statistical-Deep-Learning/IDS
Signed PMLR Licence Agreement: pdf
Readers: auai.org/UAI/2025/Conference, auai.org/UAI/2025/Conference/Area_Chairs, auai.org/UAI/2025/Conference/Reviewers, auai.org/UAI/2025/Conference/Submission112/Authors, auai.org/UAI/2025/Conference/Submission112/Reproducibility_Reviewers
Submission Number: 112
Loading