Model Debiasing by Learnable Data Augmentation

Published: 26 Apr 2026, Last Modified: 26 Apr 2026Accepted by TMLREveryoneRevisionsBibTeXCC BY 4.0
Abstract: Deep Neural Networks are well known for efficiently fitting training data, yet experiencing poor generalization capabilities whenever some kind of bias dominates over the actual task labels, resulting in models learning “shortcuts”. In essence, such models are often prone to learn spurious correlations between data and labels. In this work, we tackle the problem of learning from biased data in the very realistic unsupervised scenario, i.e., when the bias is unknown. This is a much harder task as compared to the supervised case, where auxiliary, bias-related annotations, can be exploited in the learning process. This paper proposes a novel 2-stage learning pipeline featuring a data augmentation strategy able to regularize the training. First, biased/unbiased samples are identified by training over-biased models. Second, such subdivision (typically noisy) is exploited within a data augmentation framework, properly combining the original samples while learning mixing parameters, which has a regularization effect. Experiments on synthetic and realistic biased datasets show state-of-the-art classification accuracy, outperforming competing methods, ultimately proving robust performance on both biased and unbiased examples. Notably, being our training method totally agnostic to the level of bias, it also positively affects performance for any, even apparently unbiased, dataset, thus improving the model generalization regardless of the level of bias (or its absence) in the data.
Submission Length: Long submission (more than 12 pages of main content)
Previous TMLR Submission Url: https://openreview.net/forum?id=eCG5krscQy
Changes Since Last Submission: We have submitted a new revision of the paper reflecting all the modifications discussed with the reviewers during the rebuttal phase. More in details: - Expanded Conclusion section to address the comment of Reviewer kV8N - Expanded related works section with discussion of previous literature (i.e. “Debiasify” (Byasi et al 2025)), as suggested by Reviewer ThHQ - Expanded experiments section to address the question of Reviewer ThHQ on the comparison with supervised methods - Expanded ablation section (Table 4 and related subsection) to address the comment of Reviewer ThHQ concerning the choice of the architecture for our stage 1 - We have carefully checked for typos and grammars as suggested by Reviewer kV8N Please note that at this stage we have removed all red text for the camera-ready version version of the paper. We are not sure if at this stage we should already de-anonymize the paper.
Assigned Action Editor: ~Vinay_P_Namboodiri1
Submission Number: 5562
Loading