Abstract: Synthesizing realistic 3D indoor scenes is a challenging task
that traditionally relies on manual arrangement and annotation by expert designers. Recent advances in autoregressive models have automated
this process, but they often lack semantic understanding of the relationships and hierarchies present in real-world scenes, yielding limited
performance. In this paper, we propose Forest2Seq, a framework that
formulates indoor scene synthesis as an order-aware sequential learning
problem. Forest2Seq organizes the inherently unordered collection of
scene objects into structured, ordered hierarchical scene trees and forests.
By employing a clustering-based algorithm and a breadth-first traversal,
Forest2Seq derives meaningful orderings and utilizes a transformer
to generate realistic 3D scenes autoregressively. Experimental results on
standard benchmarks demonstrate Forest2Seq’s superiority in synthesizing more realistic scenes compared to top-performing baselines, with
significant improvements in FID and KL scores. Our additional experiments for downstream tasks and ablation studies also confirm the importance of incorporating order as a prior in 3D scene generation.
Loading