ADOR: Attention Dilution and Overlap Resolver for Complex Prompts in Text-to-Image Diffusion Models

19 Sept 2025 (modified: 12 Nov 2025)ICLR 2026 Conference Withdrawn SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Image Generation, Diffusion models
Abstract: Text-to-image diffusion models have achieved remarkable progress, producing high-quality and realistic images. Nevertheless, these models still encounter challenges with semantic misalignment, particularly when required to understand complex prompts involving multiple objects and diverse attributes. Although several approaches have been proposed to address these issues, investigation into the causes of semantic misalignment has remained limited. In this work, we examine the behavior of cross-attention in text-to-image diffusion models and identify two key factors contributing to semantic misalignment: cross-attention overlap and cross-attention dilution. Building on these findings, we propose ADOR, a training-free framework that mitigates semantic misalignment in a single forward pass, without requiring external guidance. ADOR consists of two complementary modules: the Attention Overlap Disentangler (AO-Disentangler) and the Attention Dilution Reviver (AD-Reviver). The AO-Disentangler reduces cross-attention overlap between noun phrases via distance-based masking, thereby enhancing separation between object–attribute pairs. The AD-Reviver tackles the issue of reduced average cross-attention intensity that arises with longer prompts by applying L2-normalization or selective amplification. It ensures that semantic concepts remain represented during generation. We evaluate ADOR on standard benchmarks and demonstrate that it achieves state-of-the-art performance while preserving efficiency through its training-free, single-pass design.
Primary Area: generative models
Submission Number: 17669
Loading