Debiasing CLIP with Neural Interventions

Published: 31 Mar 2026, Last Modified: 23 Dec 202548th European Conference on Information Retrieval (ECIR 2026)EveryoneWM2024 Conference
Abstract: This paper presents an inference-time method to mitigate demographic bias in CLIP-like cross-modal retrieval models through targeted neural interventions in their internal attention mechanisms. We first identify “expert” attention heads that encode demographic information by systematically analyzing CLIP’s internal representations in response to labeled inputs. At inference, we intervene these heads – replacing their activations with demographic prototypes or by neutralizing them (zero ablation). We chose to intervene specifically at the CLS token, as it aggregates information globally across image patches and is directly responsible for the final image embedding. Across fairness benchmarks such as SISPI and So-B-IT, our interventions achieve bias reduction comparable to or exceeding state-of-the-art methods, while being substantially lighter and requiring no retraining
Loading