DLM-3D: Diffusion Language Models for 3D Point Clouds Generation

18 Sept 2025 (modified: 12 Feb 2026)ICLR 2026 Conference Desk Rejected SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Diffusion language models, 3D point clouds generation
Abstract: Generating high-fidelity and diverse 3D point clouds is a fundamental challenge in 3D vision. Prior approaches primarily rely on autoregressive models or continuous diffusion processes, which often suffer from limited scalability, slow inference, and difficulties in modeling long-range dependencies across unordered point sets. In this work, we introduce DLM-3D, the first framework that adapts diffusion language models to the domain of 3D shape generation. Our key idea is to tokenize 3D point clouds into discrete semantic units and leverage discrete diffusion denoising over this sequence space, enabling parallel generation while preserving geometric fidelity. To better capture the intrinsic structure of point clouds, we design a permutation-invariant tokenizer and a geometry-aware noise schedule, which together allow DLM-3D to learn both local geometric consistency and global shape coherence. Extensive experiments on ShapeNet and ModelNet demonstrate that DLM-3D achieves state-of-the-art performance in terms of fidelity, diversity, and coverage, significantly outperforming autoregressive and continuous diffusion baselines. Moreover, DLM-3D supports flexible generation modes, including shape completion and conditional synthesis, without task-specific retraining.
Primary Area: generative models
Submission Number: 11025
Loading