Synergistic Push, Grasp and Active Vision with Evidential Learning for Manipulation-Enhanced Mapping in Confined Environments
Keywords: Deep learning in grasping and manipulation, Perception for grasping and manipulation, Deep Learning for Visual Perception
TL;DR: MS-MEM unifies active viewing, pushing, and grasping under evidential uncertainty modeling, allowing a robot to actively reduce occlusions and improve mapping accuracy while minimizing unnecessary disturbance to the scene.
Abstract: We propose Multi-Skill Manipulation-Enhanced Mapping (MS-MEM), a hierarchical evidential framework that jointly reasons over active viewpoint selection, non-prehensile pushing, and prehensile grasping for uncertainty-aware occlusion mapping. MS-MEM couples a scene-level metric-semantic belief map, with a local grasp representation based on a full-evidential extension of vMF-Contact (\emph{FE-vMF}) that models both grasp affordance and directional uncertainty. To maintain consistent grasp decisions across views, we develop FE-UMGF, an uncertainty-guided multi-view fusion module that aggregates evidential grasp hypotheses over time. Within a POMDP formulation, candidate push and grasp actions are evaluated by their disturbance and occlusion-aware information gain (DOIG), a novel objective that allows the robot to choose whether to observe, rearrange clutter, or remove specific occluders. Experiments show that MS-MEM combines the strengths of both manipulation skills, yielding high mapping accuracy and substantially lower scene disturbance through disturbance-aware action selection.
Submission Number: 8
Loading