SMART-3D: Scaling Masked AutoRegressive Transformer for Efficient 3D Shape Generation

Shentong Mo; Yufei Guo

SMART-3D: Scaling Masked AutoRegressive Transformer for Efficient 3D Shape Generation

Shentong Mo, Yufei Guo

17 Sept 2025 (modified: 27 Nov 2025)ICLR 2026 Conference Withdrawn SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Autoregressive models, 3D shape generation

Abstract: Autoregressive models have shown promise in 3D shape generation by modeling complex spatial dependencies between discrete shape tokens. However, their sequential nature and token-by-token sampling limit scalability and generation speed, especially for high-resolution shapes. In this work, we propose SMART-3D (Scaling Masked AutoRegressive Transformers for 3D generation), a novel framework that combines the modeling capacity of autoregressive transformers with the efficiency of masked generation. By introducing a hierarchical token representation and a progressive masked generation schedule, SMART-3D enables parallel decoding of 3D structures without sacrificing autoregressive fidelity. We further optimize the model with spatially-aware masking and lightweight transformer blocks, allowing generation of detailed 3D shapes with significantly reduced computational overhead. Experiments on ShapeNet, ModelNet, and ShapeNet-55 datasets demonstrate that SMART-3D achieves state-of-the-art performance in both generation quality and speed, outperforming previous competitive baselines. Our approach offers a scalable and practical solution for high-fidelity 3D shape synthesis in real-world applications.

Primary Area: generative models

Submission Number: 9157

Loading