Debiasing Diffusion Models via Score Guidance

TMLR Paper6747 Authors

01 Dec 2025 (modified: 15 Dec 2025)Under review for TMLREveryoneRevisionsBibTeXCC BY 4.0
Abstract: With the increasing use of Diffusion Models (DMs) in everyday applications, it is very important to ensure that these models are \textit{fair} towards various demographic/societal groups. However, due to several reasons DMs inherit biases towards specific gender, race and community, which can perpetuate and amplify societal inequities. Hence, it is important to \textit{debias} DMs. Previous debiasing approaches require additional reference data, model fine-tuning, or auxiliary classifier training - each of which incur additional cost. In this work, we provide a training-free inference-time method for debiasing diffusion models. First, we provide a theoretical explanation for the cause of biases inhibited by DMs. Specifically, we show that the unconditional score predicted by the denoiser can be expressed as a convex combination of conditional scores corresponding to the attributes under consideration. We then argue that the weights allocated to underrepresented attributes are less which leads to domination of other attributes in overall score function. Building on this, we propose a score-guidance method that adheres to a user provided reference distribution for generation. Moreover, we show that this score guidance can be achieved via different modalities like `text' and `exemplar images'. To our knowledge, our method is the first to provide a debiasing framework that can utilize different modalities for diffusion models. We demonstrate the effectiveness of our method across various attributes on both unconditional and conditional text-based diffusion models, including Stable Diffusion.
Submission Type: Regular submission (no more than 12 pages of main content)
Assigned Action Editor: ~Markus_Heinonen1
Submission Number: 6747
Loading