Articulated Object Manipulation using Online Axis Estimation with SAM2-Based Tracking

Abstract

Articulated object manipulation requires precise object interaction, where the object's axis must be carefully considered. Previous research employed interactive perception for manipulating articulated objects, but typically, open-loop approaches often suffer from overlooking the interaction dynamics. To address this limitation, we present a closed-loop pipeline integrating interactive perception with online axis estimation from segmented 3D point clouds. Our method leverages any interactive perception technique as a foundation for interactive perception, inducing slight object movement to generate point cloud frames of the evolving dynamic scene. These point clouds are then segmented using Segment Anything Model 2 (SAM2), after which the moving part of the object is masked for accurate motion online axis estimation, guiding subsequent robotic actions. Our approach significantly enhances the precision and efficiency of manipulation tasks involving articulated objects. Experimental results in simulated environments demonstrate that, our method outperforms baseline approaches, particularly in tasks requiring precise axis-based control, highlighting the necessity of integrating real-time perception with online optimization for more efficient manipulation.



Our Pipeline

In our pipeline, an RGB-D camera captures the dynamic scene, which is induced by the slight movement from the Interactive Perception & Init-Manipulation Module. The captured scene is then processed by the Tracking & Segmentation Module, which tracks and segments the moving part of the articulated object at a 3D level. This segmented data is subsequently passed to the Axis Estimation & Manipulation Module. Here, the motion axis is explicitly calculated, providing informed guidance for the robot's manipulation policy.

Video


Results

We conduct our experiments in the SAPIEN simulator. Tasks involve opening doors or drawers to different extents.

We select an object from each category to visualize the manipulation process with online axis estimation refinement. The initial estimated axis is represented by a lighter shade of red, while the progressively refined axis is indicated by increasingly darker shades of red.
Success Rate of Basic Tasks
For each task, we evaluate our methods compared with RGBManip and other baselines separately on RGBManip's training set and testing set. Success rates of the first 100 experiments are used as metrics for comparison respectively.
Experimental results illustrate that, both our method and RGBManip almost outperform other baseline approaches while Ours consistently surpasses RGBManip in basic tasks.
Success Rate of More Challenging Tasks
Results of Real-world Deployment
We demonstrate the effectiveness of our method in real-world deployment by visualizing the process of the online axis estimation.

Acknowledgements

Our code is built upon RGBManip, GroundingDINO and SAM2. We would like to thank the authors for their excellent works.