Learning Where It Matters: Responsible and Interpretable Text-to-Image Generation with Background Consistency
Abstract: Text-to-image diffusion models have achieved remarkable progress, yet they still struggle to produce unbiased and responsible outputs. A promising direction is to manipulate the bottleneck space of the U-Net (the $h$-space), which provides \textit{interpretability} and \textit{controllability}. However, existing methods rely on learning attributes from the entire image, entangling them with spurious features and offering no corrective mechanisms at inference. This uniform reliance leads to poor subject alignment, fairness issues, reduced photorealism, and incoherent backgrounds in scene-specific prompts. To address these challenges, we propose two complementary innovations for training and inference. First, we introduce a spatially focused concept learning framework that disentangles target attributes into concept vectors by suppressing target attribute features within the multi-head cross-attention (MCA) modules and attenuating the encoder output (i.e., $h$-vector) to ensure the concept vector exclusively captures target attribute features. In addition, we introduce a spatially weighted reconstruction loss to emphasize regions relevant to the target attribute. Second, we design an inference-time strategy that improves background consistency by enhancing low-frequency components in the $h$-space. Experiments demonstrate that our approach improves fairness, subject fidelity, and background coherence while preserving visual quality and prompt alignment, outperforming state-of-the-art $h$-space methods. The code is provided at https://github.com/Moslem-Sh21/learning-where-it-matters.
Submission Type: Regular submission (no more than 12 pages of main content)
Changes Since Last Submission: Since the last TMLR submission, we have revised the manuscript to address all reviewer comments, improved the clarity of the methodological explanations, and added additional experimental results and analyses; all modifications are highlighted in the updated version.
Code: https://github.com/Moslem-Sh21/learning-where-it-matters
Supplementary Material: zip
Assigned Action Editor: ~Ofir_Lindenbaum1
Submission Number: 7303
Loading