Keywords: 3D city generation, procedure generation, agent system
TL;DR: CityGenAgent is a framework that lets anyone create and customize realistic 3D cities just by describing what they want in plain language.
Abstract: The automated generation of interactive 3D cities is a critical challenge with broad applications in autonomous driving, virtual reality, and embodied intelligence. While recent advances in generative models and procedural techniques have improved the realism and scalability of city generation, existing methods often struggle with high-fidelity asset creation, controllability, and manipulation. In this work, we present CityGenAgent, a natural language-driven framework based on large language models (LLMs) for hierarchical procedural generation of high-quality 3D cities. Our approach introduces two core programs—$\textbf{Block Program}$ and $\textbf{Building Program}$—which decompose city generation into interpretable and editable components. $\textbf{BlockGen}$ and $\textbf{BuildingGen}$ are trained to generate and execute these programs. We design Spatial Alignment Reward to enhance spatial reasoning and Visual Consistency Reward to bridge the gap between textual program descriptions and their 3D visual realizations.
Additionally, benefiting from the use of programs and the model's generalization capabilities, our framework allows users to manipulate the results via natural language. Comprehensive evaluations show that CityGenAgent achieves impressive semantic alignment and higher visual quality, establishing a stronger foundation for broad applications.
Primary Area: generative models
Submission Number: 8493
Loading