Keywords: mobile manipulation, policy evaluation and deployment, 3D representation
TL;DR: N2M bridges the gap between navigation and manipulation by predicting preferred initial poses from ego-centric point clouds. It adapts in real time (>30 Hz), needs only a few rollouts for training, and generalizes well to unseen environments.
Abstract: In mobile manipulation, the manipulation policy has strong preferences for initial poses where it is executed. However, the navigation module focuses solely on reaching the task area, without considering which initial pose is preferable for downstream manipulation.
We identify this critical, yet highly overlooked problem and introduce N2M, a strongly practical solution that guides the robot to a preferable initial pose after reaching the task area, thereby substantially improving task success rates. N2M features five key advantages: (1) reliance solely on ego-centric observation without requiring global or historical information; (2) real-time adaptation to environmental changes; (3) reliable prediction with high viewpoint robustness; (4) broad applicability across diverse tasks, manipulation policies, and robot hardware; and (5) remarkable data efficiency and generalizability.
N2M demonstrates state-of-the-art performance compared to prior methods, showing 3% to 54% performance improvement compared to reachability-based methods and 24% to 55% performance improvement compared to the only existing policy-aware alternative in PnPCounterToCab and CloseDrawer tasks, respectively.
Furthermore, in the Toybox Handover task, N2M provides reliable predictions even in unseen environments with only 15 data samples, showing remarkable data efficiency and generalizability.
**Anonymized project website: https://nav2manip.github.io**
Supplementary Material: zip
Primary Area: applications to robotics, autonomy, planning
Submission Number: 4396
Loading