MAAP: Multi-Agent Active Perception for Collaborative Manipulation

Bruno N.Y. Chen; Heng Zhou; Li Kang; Xiufeng Song; Jiahua Ma; Zhemeng Zhang; Yiran Qin

MAAP: Multi-Agent Active Perception for Collaborative Manipulation

Bruno N.Y. Chen, Heng Zhou, Li Kang, Xiufeng Song, Jiahua Ma, Zhemeng Zhang, Yiran Qin

Published: 16 May 2026, Last Modified: 16 May 2026ASAB 2026 OralEveryoneRevisionsCC BY 4.0

Keywords: Active Perception, Robotics Manipulation, Multi-Agent Robotic Systems, Robot Learning

Abstract: Multi-agent manipulation naturally produces multiple task-driven viewpoints, as each robotic arm carries a wrist-mounted camera and moves through the scene while acting. However, these distributed observations are typically underutilized. We introduce MAAP (Multi-Agent Active Perception), a framework that treats every arm as a dual-purpose agent: simultaneously a manipulator and a perception source. MAAP combines a VLM-based orchestrator for selecting feasible workspace-role configurations with imitation controllers that aggregate multi-wrist observations. Across four collaborative tasks spanning no-occlusion to severe multi-phase occlusion, the best MAAP variant achieves 73.0\% average success vs.\ 62.5\% (single active view) and 56.5\% (global), with the largest gain on occlusion-heavy tasks (69\% vs.\ 14\%). Frozen DINOv2 features substantially stabilize multi-view fusion, rescuing brittle channel fusion (24.0\%$\rightarrow$70.5\%).

Email Sharing: We authorize the sharing of all author emails with Program Chairs.

Data Release: We authorize the release of our submission and author names to the public in the event of acceptance.

Submission Number: 7

Loading