TextCenGen: Attention-Guided Text-Centric Background Adaptation for Text-to-Image Generation

Published: 01 May 2025, Last Modified: 18 Jun 2025ICML 2025 posterEveryoneRevisionsBibTeXCC BY 4.0
TL;DR: TextCenGen introduces a dynamic adaptation of the blank region for text-friendly image generation, and enhances T2I model outcomes on specially collected prompt datasets, catering to varied text positions.
Abstract: Text-to-image (T2I) generation has made remarkable progress in producing high-quality images, but a fundamental challenge remains: creating backgrounds that naturally accommodate text placement without compromising image quality. This capability is non-trivial for real-world applications like graphic design, where clear visual hierarchy between content and text is essential. Prior work has primarily focused on arranging layouts within existing static images, leaving unexplored the potential of T2I models for generating text-friendly backgrounds. We present TextCenGen, a training-free approach that actively relocates objects before optimizing text regions, rather than directly reducing cross-attention which degrades image quality. Our method introduces: (1) a force-directed graph approach that detects conflicting objects and guides them relocation using cross-attention maps, and (2) a spatial attention constraint that ensures smooth background generation in text regions. Our method is plug-and-play, requiring no additional training while well balancing both semantic fidelity and visual quality. Evaluated on our proposed text-friendly T2I benchmark of 27,000 images across three seed datasets, TextCenGen outperforms existing methods by achieving 23\% lower saliency overlap in text regions while maintaining 98\% of the original semantic fidelity measured by CLIP score and our proposed Visual-Textual Concordance Metric (VTCM).
Lay Summary: When you add text to an image, like putting a caption on a photo or adding information to a poster, the text needs to be clearly visible. However, images created by AI often have important objects or busy patterns that make text hard to read when placed on top. This is a significant problem for designers who need backgrounds that work well with text. Our research introduces TextCenGen, a new method that creates images specifically designed to accommodate text. Unlike previous approaches that try to fit text onto existing images, our method actually modifies how the AI generates the image in the first place. TextCenGen works by identifying which objects in the image would conflict with the planned text area and gently moving these objects to other parts of the image. It then ensures the text area has a smooth, clean background. This creates a harmonious balance between the image content and the space reserved for text. Our method requires no additional training and can work with existing AI image generation tools. When tested against other approaches, TextCenGen created images that were 23% better at keeping important objects out of text areas while maintaining 98% of the image quality and meaning. This technology could help designers, marketers, and everyday users create more effective visual communications where text and images work together seamlessly, such as for social media posts, advertisements, or mobile app interfaces.
Application-Driven Machine Learning: This submission is on Application-Driven Machine Learning.
Link To Code: https://github.com/tianyilt/TextCenGen_Background_Adapt
Primary Area: Applications->Computer Vision
Keywords: Image Synthesis
Submission Number: 6381
Loading