Informative Data Reweighting for Image Classification

Published: 03 Mar 2026, Last Modified: 07 Apr 2026ICLR 2026 DeLTa Workshop PosterEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Generative Data Augmentation, Synthetic Samples, Informative Data Reweighting, Informative Training Samples, Image Classification
TL;DR: We introduce Informative Data Reweighting (IDR), a principled Information Bottleneck-based framework that enhances image classification by prioritizing informative samples in generative data augmentation.
Abstract: Deep Neural Networks (DNNs) have achieved remarkable success in image classification tasks. However, their training typically requires large-scale, high-quality labeled datasets, which may be scarce or infeasible to obtain in certain computer vision tasks. To alleviate this challenge, Generative Data Augmentation (GDA) has been introduced to improve model performance by increasing the number of training samples with synthetic data generated by models such as Diffusion Models (DMs). Despite its benefits, GDA-generated synthetic samples often contain noise, which can negatively impact the performance of image classification models when incorporated into training. Prior approaches, including data selection and reweighting techniques, aim to address this issue but often rely on external expert models or clean metadata. In this work, we introduce Informative Data Reweighting (IDR), a principled sample reweighting framework based on the Information Bottleneck (IB) principle, to enhance the performance of DNNs for image classification using GDA. Through extensive experiments, we demonstrate that IDR effectively prioritizes more informative training samples in the augmented training set comprising original real training samples and synthetic training samples, resulting in substantial improvements over existing data selection and reweighting strategies for GDA in image classification. The code for IDR is available at~\url{https://anonymous.4open.science/r/IDR-3BE0/}.
Submission Number: 27
Loading