Keywords: Autoregressive models, 3D shape generation
Abstract: Autoregressive models have shown promise in 3D shape generation by modeling complex spatial dependencies between discrete shape tokens. However, their sequential nature and token-by-token sampling limit scalability and generation speed, especially for high-resolution shapes. In this work, we propose SMART-3D (Scaling Masked AutoRegressive Transformers for 3D generation), a novel framework that combines the modeling capacity of autoregressive transformers with the efficiency of masked generation. By introducing a hierarchical token representation and a progressive masked generation schedule, SMART-3D enables parallel decoding of 3D structures without sacrificing autoregressive fidelity. We further optimize the model with spatially-aware masking and lightweight transformer blocks, allowing generation of detailed 3D shapes with significantly reduced computational overhead. Experiments on ShapeNet, ModelNet, and ShapeNet-55 datasets demonstrate that SMART-3D achieves state-of-the-art performance in both generation quality and speed, outperforming previous competitive baselines. Our approach offers a scalable and practical solution for high-fidelity 3D shape synthesis in real-world applications.
Primary Area: generative models
Submission Number: 9157
Loading