LayoutDM: Precision Multi-Scale Diffusion for Layout-to-Image

Published: 2024, Last Modified: 22 Jan 2026ICME 2024EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: In recent research literature, the layout to image domain has gained significant traction. Research based on GANs can generate complete images, but there are still issues with insufficient details and overall quality not being high. Diffusion models, on the other hand, challenges still exist in ensuring the quality of image details in complex scene regions, as well as ensuring the smooth expression of the overall semantic information. To address these challenges, we propose LayoutDM based on multi-scale diffusion, which employs the Parallel Sampling Module to enhance local precision and ensures global semantic coherence through the Semantic Coherence Module. Significantly, our approach generates within the visible space, progressively revealing more details and semantic information. Simultaneously, our method enhances layout part handling via parallel region-wise clip guidance, achieving strong zero-shot generation without direct training samples. Tests on COCO-stuff and VG datasets confirm that our approach achieves both fine-grained object generation and overall visual effectiveness.
Loading