CAPT: Category-level Articulation Estimation from a Single Point Cloud Using Transformer

Lian Fu, Ryoichi Ishikawa, Yoshihiro Sato, Takeshi Oishi

Published: 2024, Last Modified: 01 Mar 2026ICRA 2024EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: The ability to estimate joint parameters is essential for various applications in robotics and computer vision. In this paper, we propose CAPT: category-level articulation estimation from a point cloud using Transformer. CAPT uses an end-to-end transformer-based architecture for joint parameter and state estimation of articulated objects from a single point cloud. The proposed CAPT methods accurately estimate joint parameters and states for various articulated objects with high precision and robustness. The paper also introduces a motion loss approach, which improves articulation estimation performance by emphasizing the dynamic features of articulated objects. Additionally, the paper presents a double voting strategy to provide the framework with coarse-to-fine parameter estimation. Experimental results on several category datasets demonstrate that our methods outperform existing alternatives for articulation estimation. Our research provides a promising solution for applying Transformer-based architectures in articulated object analysis.
Loading