Keywords: Active Perception, Robotics Manipulation, Multi-Agent Robotic Systems, Robot Learning
Abstract: Multi-agent manipulation naturally produces multiple task-driven viewpoints, as each robotic arm carries a wrist-mounted camera and moves through the scene while acting. However, these distributed observations are typically underutilized. We introduce MAAP (Multi-Agent Active Perception), a framework that treats every arm as a dual-purpose agent: simultaneously a manipulator and a perception source. MAAP combines a VLM-based orchestrator for selecting feasible workspace-role configurations with imitation controllers that aggregate multi-wrist observations. Across four collaborative tasks spanning no-occlusion to severe multi-phase occlusion, the best MAAP variant achieves 73.0\% average success vs.\ 62.5\% (single active view) and 56.5\% (global), with the largest gain on occlusion-heavy tasks (69\% vs.\ 14\%). Frozen DINOv2 features substantially stabilize multi-view fusion, rescuing brittle channel fusion (24.0\%$\rightarrow$70.5\%).
Email Sharing: We authorize the sharing of all author emails with Program Chairs.
Data Release: We authorize the release of our submission and author names to the public in the event of acceptance.
Submission Number: 7
Loading