CymbaDiff: Structured Spatial Diffusion for Sketch-based 3D Semantic Urban Scene Generation

Li Liang; Bo Miao; Xinyu Wang; NAVEED AKHTAR; Jordan Vice; Ajmal Saeed Mian

CymbaDiff: Structured Spatial Diffusion for Sketch-based 3D Semantic Urban Scene Generation

Li Liang, Bo Miao, Xinyu Wang, NAVEED AKHTAR, Jordan Vice, Ajmal Saeed Mian

Published: 18 Sept 2025, Last Modified: 25 Jan 2026NeurIPS 2025 posterEveryoneRevisionsBibTeXCC BY 4.0

Keywords: 3D semantic scene generation, 3D semantic scene completion

TL;DR: Introducing a new task " sketch-based 3D outdoor scene generation" and presenting the first 3D outdoor generation dataset and propose CymbaDiff for 3D semantic scene generation task

Abstract: Outdoor 3D semantic scene generation produces realistic and semantically rich environments for applications such as urban simulation and autonomous driving. However, advances in this direction are constrained by the absence of publicly available, well-annotated datasets. We introduce SketchSem3D, the first large‑scale benchmark for generating 3D outdoor semantic scenes from abstract freehand sketches and pseudo‑labeled annotations of satellite images. SketchSem3D includes two subsets, Sketch-based SemanticKITTI and Sketch-based KITTI-360 (containing LiDAR voxels along with their corresponding sketches and annotated satellite images), to enable standardized, rigorous, and diverse evaluations. We also propose Cylinder Mamba Diffusion (CymbaDiff) that significantly enhances spatial coherence in outdoor 3D scene generation. CymbaDiff imposes structured spatial ordering, explicitly captures cylindrical continuity and vertical hierarchy, and preserves both physical neighborhood relationships and global context within the generated scenes. Extensive experiments on SketchSem3D demonstrate that CymbaDiff achieves superior semantic consistency, spatial realism, and cross-dataset generalization. The code and dataset will be available at here.

Primary Area: Deep learning (e.g., architectures, generative models, optimization for deep networks, foundation models, LLMs)

Submission Number: 25505

Loading