SLayR: Scene Layout Generation with Rectified Flow

Published: 23 Sept 2025, Last Modified: 23 Dec 2025SPIGM @ NeurIPSEveryoneRevisionsBibTeXCC BY 4.0
Keywords: scene layout generation, text to image generation, rectifed flow, explainable AI
TL;DR: We introduce our novel method for scene layout generation using rectified flow, intregrating it into a text-to-image pipeline, while demonstrating the plausibility and variety of its generations.
Abstract: We introduce SLayR, Scene Layout Generation with Rectified flow, a novel transformer-based model for text-to-layout generation, which can integrate into a complete text-to-image pipeline. SLayR addresses a domain in which current text-to-image pipelines struggle: generating scene layouts that are of significant variety and plausibility, when the given prompt is ambiguous and does not provide constraints on the scene. In this setting, SLayR surpasses existing baselines, including LLMs. To accurately evaluate the layout generation, we introduce a new benchmark suite, including numerical metrics and a carefully designed repeatable human-evaluation procedure that assesses the plausibility and variety of images that are generated. We show that our method sets a new state of the art for achieving high plausibility and variety simultaneously, while being at least 3× times smaller in the number of parameters.
Submission Number: 10
Loading