Breaking the 3D Dataset Bottleneck: Fast Scalable Generation of Aligned 3D Assets from Scratch for Category 6D Pose Estimation and Robotic Grasping
Keywords: Aligned 3D mesh generation, Category 6D pose estimation, Sim2real, Grasping
Abstract: While 2D vision has been revolutionized by large-scale datasets, 3D vision remains constrained by scarce, canonically aligned data. We introduce the first scalable, automated framework that generates complete category-level 6D pose datasets directly from text prompts, bypassing existing 3D assets. Our method achieves: (1) reliable asset generation via a controlled text-to-image-to-3D pipeline; (2) built-in canonical alignment through depth-conditioned generation (96\% pose consistency); (3) large-scale 6D annotation via mixed reality rendering. The pipeline produces aligned meshes in under 3 minutes per object (5–20× speedup). We generate over 1,000 instances for each of 153 categories (153,000 meshes, >40× increase per category). Extensive evaluation shows competitive zero-shot sim2real transfer on NOCS and superior robotic grasping (87.8\% success), where aligned meshes prove essential. We release the largest publicly available aligned 3D mesh dataset, category-level 6D pose dataset, grasping environments, and open-source pipeline. Code and data: https://genomni3d.github.io/
Submission Number: 27
Loading