VoxSet: Sparse Voxel Set Tokenizer for 3D Shape Generation

03 Sept 2025 (modified: 11 Feb 2026)Submitted to ICLR 2026EveryoneRevisionsBibTeXCC BY 4.0
Keywords: 3D Tokenizer, 3D Generation
TL;DR: A novel 3D shape tokenizer taking advantages of both sparse voxel and vector set representations.
Abstract: 3D tokenizers are crucial in latent 3D generative models. Recent sparse voxel tokenizers can reconstruct detailed shapes but produce variable-length latent tokens which necessitate a two-stage generation pipeline. Conversely, vector set tokenizers have fixed-length latent tokens with higher compression but struggle with reconstruction quality. In this work, we introduce VoxSet, a novel tokenizer that combines the strengths of both approaches. Our method employs sparse voxels in the outer layers to capture fine surface details and a vector set bottleneck for high compression. This design achieves high-quality reconstructions while maintaining a compact and fixed-length latent code for different objects, eliminating the extra generation stage required by sparse voxel methods. Experiments demonstrate that VoxSet achieves competitive reconstruction quality compared to sparse voxel tokenizers, while sharing the simpler training and inference pipeline of vector set-based 3D generation models.
Primary Area: generative models
Submission Number: 1584
Loading