Do Bias Mitigation Methods Generalize? A Cross-Modality Study

23 Jan 2026 (modified: 14 Apr 2026)Under review for TMLREveryoneRevisionsBibTeXCC BY 4.0
Abstract: Spurious correlations, defined as predictive but non-causal relationships within training data, constitute a significant challenge for deep learning. When such shortcuts exist in a dataset, models tend to exploit them instead of learning the intended, task-relevant features, resulting in biased predictions and poor generalization. Although numerous bias mitigation methods have been developed, they are primarily evaluated on natural images, their ability to generalize to other modalities and domains, namely text (e.g., occupational gender imbalance and lexical bias), audio (e.g., demographic disparities and device signatures), medical imaging (e.g., hospital-level biases such as scanner or protocol differences), and video (e.g., scene background bias), remains largely unexplored. In this work, we conduct the first comprehensive cross-modality benchmark study, evaluating 14 bias mitigation methods across 6 datasets spanning text, audio, medical imaging, and video. For each dataset, we introduce tailored configurations designed to assess bias mitigation performance. Our findings show that several methods provide consistent improvements across modalities, with a subset exhibiting statistically significant bias mitigation in all domains. This study offers the first systematic evidence of cross-modal generalization for bias mitigation approaches and establishes a benchmark resource aimed at encouraging the development of bias mitigation methods that extend beyond the natural images domain. Code and data will be released publicly upon acceptance.
Submission Type: Long submission (more than 12 pages of main content)
Changes Since Last Submission: Fixed double-blind policy violation.
Assigned Action Editor: ~Enzo_Tartaglione1
Submission Number: 7118
Loading