Generate What Matters: Steering Diffusion Models for Targeted Data Generation to Improve Classification

Jeeyung Kim; Erfan Esmaeili; Qiang Qiu

Generate What Matters: Steering Diffusion Models for Targeted Data Generation to Improve Classification

Jeeyung Kim, Erfan Esmaeili, Qiang Qiu

17 Sept 2025 (modified: 11 Feb 2026)Submitted to ICLR 2026EveryoneRevisionsBibTeXCC BY 4.0

Keywords: Diffusion models; Reinforcement Learning

Abstract: When labeled data are scarce, augmenting training sets with images from off-the-shelf generative models can help, but simply producing more samples is often insufficient. A major limitation of existing approaches is that they overlook the usefulness of synthetic data for a given classification task, evaluating generations only retrospectively through downstream performance. To address this issue, we identify the properties that make samples effective for classification and propose a principled way to generate them. We quantify a sample’s usefulness through its influence, captured by how the classifier’s loss gradient on that sample aligns with gradients from validation examples. Our key finding is that effective samples exhibit a clear Class-Contrastive Influence (C2I) gap: their gradients show strong positive alignment with same-class data and strong negative alignment with other-classes data. Our theoretical analysis confirms such high-gap samples are typically hard examples located near the decision boundary, which are valuable for improving model robustness. Building on this insight, we introduce a reinforcement-learning fine-tuning scheme for diffusion models with a C2I-based reward that drives generation of class-informative, boundary-proximal samples. Across several few-shot medical imaging tasks, C2I-guided generation consistently improves both accuracy and robustness over diffusion-based baselines, demonstrating that boundary-focused augmentation provides a principled and effective strategy in low-data regimes.

Supplementary Material: zip

Primary Area: generative models

Submission Number: 9808

Loading