ARGOS: Hierarchical Autoregressive Generation of Unbounded 3D Outdoor Scenes with High Fidelity and Spatial Control

16 Sept 2025 (modified: 11 Feb 2026)Submitted to ICLR 2026EveryoneRevisionsBibTeXCC BY 4.0
Keywords: Unbounded 3D Outdoor Scenes generation;Autoregressive generation;High Fidelity geometry reconstruction; Spatial control
Abstract: We present ARGOS, a hierarchical autoregressive framework for generating unbounded 3D outdoor scenes with high fidelity and spatial control. Existing methods for large-scale 3D scene generation are limited by a fundamental trade-off between global consistency and fine geometric detail. While prior diffusion-based approaches struggle with long-range coherence, our framework resolves this tension by decoupling the challenge into two stages. First, a causal autoregressive model establishes a globally coherent layout by processing scene chunks in sequence. Second, a masked autoregressive model generates detailed local geometry conditioned on this global layout and its neighbors. These geometric latents are then decoded by our enhanced VAE to ensure high-fidelity reconstruction. To enable user control, we introduce an automated pipeline that extracts complex spatial relationships from scenes, producing a structured dataset that allows ARGOS to follow precise text-based commands. Comprehensive experiments demonstrate that ARGOS significantly outperforms existing methods in unconditional generation, achieving superior FPD and KPD metrics across multiple scales. For text-conditioned synthesis, our approach excels at generating coherent, large-scale scenes that precisely adhere to complex spatial instructions.
Primary Area: generative models
Submission Number: 7628
Loading