Keywords: 3d Reconstruction; Articulated Objects
Abstract: Articulated object perception is essential for intelligent agents in robotics, embodied AI, and augmented reality, yet reconstructing their geometry and kinematics from sparse RGB images remains a significant challenge. Traditional optimization-based methods, such as those using NeRFs or 3DGS, deliver high fidelity but demand time-intensive per-object optimization, while data-driven approaches suffer from limited 3D datasets, restricting generalization to real-world scenarios.
To address these limitations, we introduce a novel, fully training-free and feed-forward framework that reconstructs and analyzes articulated objects from 1-4 sparse, unposed RGB images per state, captured in two states of the object. Our approach leverages pre-trained models for unified geometric-semantic processing without any fine-tuning, enabling efficient inference for part correspondences and joint classification, followed by lightweight optimization for parameter estimation.
Dataset-independent with a fully training-free and feed-forward design that eliminates the need for per-object training or extensive iterations, our method effectively bridges synthetic-to-real gaps, achieving superior performance on real-world objects. By integrating end-to-end zero-shot reconstruction with advanced inference and optimization, it provides an efficient, robust solution for articulation modeling, advancing scalable applications in robotics.
Primary Area: applications to robotics, autonomy, planning
Submission Number: 5056
Loading