FreeAlign: Superior Text-Image Alignment by Modulating Prompt Attention

Published: 2025, Last Modified: 09 Nov 2025ICASSP 2025EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: In recent years, Text-to-Image (T2I) models have made remarkable advancements, yet accurate accurate association of attributes remains a key challenge. This paper presents FreeAlign, a novel training-free framework designed to enhance attribute alignment in T2I generation. By modulating attention and adapting U-Net components, FreeAlign achieves precise alignment between image attributes and textual descriptions. It strengthens attribute-target associations through refining attention maps, adjusts U-Net’s backbone and skip connections based on energy ratios, and reorders prompts to balance attribute focus. Large Language Models (LLMs) enrich prompts with diverse, contextually relevant text, enhancing diffusion models’ generative power and quality. Extensive experiments show that FreeAlign delivers superior alignment for diverse prompts while preserving intricate details and ensuring structural integrity, establishing a new benchmark for attribute precision in T2I generation.
Loading