More Data or Better Algorithms: Latent Diffusion Augmentation for Deep Imbalanced Regression

More Data or Better Algorithms: Latent Diffusion Augmentation for Deep Imbalanced Regression

ICLR 2026 Conference Submission17823 Authors

19 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Deep imbalanced regression, Diffusion models, Latent space, Augmentation

TL;DR: We propose a data-level method using conditional diffusion models to address data imbalance in deep imbalanced regression.

Abstract: In many real-world regression tasks, the data distribution is heavily skewed, and models learn predominantly from abundant majority samples while failing to predict minority labels accurately. While imbalanced classification has been extensively studied, imbalanced regression remains relatively unexplored. Deep imbalanced regression (DIR) represents cases where the input data are high-dimensional and unstructured. Although several data-level approaches for tabular imbalanced regression exist, deep imbalanced regression currently lacks dedicated data-level solutions suitable for high-dimensional data and relies primarily on algorithmic modifications. To fill this gap, we propose LatentDiff, a novel framework that uses conditional diffusion models with priority-based generation to synthesize high-quality features in the latent representation space. LatentDiff is computationally efficient and applicable across diverse data modalities, including images, text, and other high-dimensional inputs. Experiments on three DIR benchmarks demonstrate substantial improvements in minority regions while maintaining overall accuracy.

Primary Area: unsupervised, self-supervised, semi-supervised, and supervised representation learning

Submission Number: 17823

Loading