MirrorDiff: Prompt redescription for zero-shot grounded text-to-image generation with attention modulation

Published: 2025, Last Modified: 12 Nov 2025Eng. Appl. Artif. Intell. 2025EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: Highlights•We propose a zero-shot grounded text-to-image-text framework for image generation.•We utilize Large Language Model as layout generator to generate scene layout.•We design a layout-guided attention modulation to mitigate the loss of small object.•We present semantic text regeneration supervision to align regenerated text and the input text.
Loading