Abstract: The automated generation of 3D city scenes has attracted considerable attention due to its broad applications in areas such as virtual reality, urban planning, and digital media. Traditional approaches for constructing 3D city environments typically depend on labor-intensive manual modeling or the use of complex, non-editable training models. To overcome these limitations, we propose an innovative framework that generates fully editable 3D city scenes directly from natural language descriptions. Our framework utilizes a structured data extraction process to decouple model and layout features from textual descriptions, facilitating the creation of 2D layouts that guide the generation of 3D terrains. Furthermore, we introduce a constrained 3D terrain generation method that ensures consistency with the semantic content and spatial relationships delineated in the input text. By integrating 2D layouts, 3D terrains, and procedural modeling techniques, our framework creates a fully editable 3D environment, empowering users to efficiently customize and modify scene components. Experimental results indicate that our method offers substantial enhancements in flexibility, realism, and user-friendliness, positioning it as a promising approach to democratize 3D city modeling.
Loading