Scene Layout Generation with Rectified Flow

TMLR Paper6941 Authors

09 Jan 2026 (modified: 26 Jan 2026)Under review for TMLREveryoneRevisionsBibTeXCC BY 4.0
Abstract: We introduce SLayR, Scene Layout Generation with Rectified flow, a novel transformer-based model for text-to-layout generation, which can integrate into a complete text-to-image pipeline. SLayR addresses a domain in which current text-to-image pipelines struggle: generating scene layouts that are of significant variety and plausibility, when the given prompt is ambiguous and does not provide constraints on the scene. In this setting, SLayR surpasses existing baselines, including LLMs. To accurately evaluate the layout generation, we introduce a new benchmark suite, including numerical metrics and a carefully designed repeatable human-evaluation procedure that assesses the plausibility and variety of images that are generated. We show that our method sets a new state of the art for achieving high plausibility and variety simultaneously, while being at least 3× times smaller in the number of parameters.
Submission Type: Regular submission (no more than 12 pages of main content)
Assigned Action Editor: ~Zhe_Gan1
Submission Number: 6941
Loading