MirrorDiff: Prompt redescription for zero-shot grounded text-to-image generation with attention modulation

Chang Liu, Mingwen Shao, Zhengyi Gong, Xiang Lv, Lingzhuang Meng

Published: 01 Aug 2025, Last Modified: 16 Nov 2025Engineering Applications of Artificial IntelligenceEveryoneRevisionsCC BY-SA 4.0
Abstract: Highlights•We propose a zero-shot grounded text-to-image-text framework for image generation.•We utilize Large Language Model as layout generator to generate scene layout.•We design a layout-guided attention modulation to mitigate the loss of small object.•We present semantic text regeneration supervision to align regenerated text and the input text.
Loading