RoboMonster: Compositional Generalization of Heterogeneous Multi-End Effector Embodied Agents

Yiran Qin; Zhemeng Zhang; Heng Zhou; Li Kang; Bruno N.Y. Chen; Ximeng Meng; Xiufeng Song; Jiahua Ma; Zhenfei Yin; Xiaohong Liu; Philip Torr; LEI BAI; Ruimao Zhang

RoboMonster: Compositional Generalization of Heterogeneous Multi-End Effector Embodied Agents

Yiran Qin, Zhemeng Zhang, Heng Zhou, Li Kang, Bruno N.Y. Chen, Ximeng Meng, Xiufeng Song, Jiahua Ma, Zhenfei Yin, Xiaohong Liu, Philip Torr, LEI BAI, Ruimao Zhang

16 Sept 2025 (modified: 14 Nov 2025)ICLR 2026 Conference Withdrawn SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Compositional Generalization, Robot Manipulation, Robot Planning

Abstract: The rapid growth of robotics has been driven by advances in both hardware and algorithms, yet a fundamental gap remains between real-world decision making and virtual simulations. Traditional designs, such as single grippers or human-like dual arms, often fail to fully exploit algorithmic capabilities or handle tasks constrained by embodiment, such as lifting thin cards or manipulating heavy and bulky objects. To address this hardware–software mismatch, we introduce RoboMonster, a new paradigm that integrates heterogeneous end-effectors with a cross-end-effector embodied planning brain. RoboMonster reasons over visual inputs, task instructions, and the properties of its diverse end-effectors to select and coordinate optimal agents, decomposing complex problems into executable sub-tasks. We design four specialized end-effectors, train corresponding policies, and develop a high-level planner based on combinatorial logical, spatial, and temporal constraints to ensure safe and efficient multi-arm collaboration. Experiments across challenging tasks demonstrate that RoboMonster significantly outperforms systems relying on a single gripper, highlighting the advantages of combining heterogeneous end-effectors with structured planning for embodied intelligence.

Primary Area: applications to computer vision, audio, language, and other modalities

Submission Number: 7876

Loading