Reducing Cognitive Overhead in Tool Use via Multi-Small-Agent Reinforcement Learning

ICLR 2026 Conference Submission19399 Authors

19 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Multi-Agent Reinforcement Learning
TL;DR: We propose MSARL, a dual-agent framework that effectively solves cognitive load interference in single-agent systems by decoupling high-level reasoning from low-level tool interpretation.
Abstract: Recent progress in multi-agent systems highlights the promise of specialized agents that collaborate through a division of labor. In contrast, most tool-augmented reasoning systems still adopt a single-agent paradigm, where one large model must interleave high-level reasoning with fine-grained tool operations—a process that often leads to cognitive-load interference and unstable outputs. We propose MSARL (Multi-Small-Agent Reinforcement Learning), a novel framework that explicitly decouples reasoning from tool execution and interpretation. In MSARL, a dedicated reasoning agent focuses on strategic problem decomposition and planning, while a specialized tool agent processes long and complex tool outputs, acting as an adaptive condenser to bridge information gaps. This role-specific separation not only reduces cognitive interference but also accelerates the information flow. To enable effective collaboration, we introduce a hierarchical reinforcement learning approach that uses role-specific and collaboration-based rewards, providing granular feedback to the tool agent and a holistic, trajectory-level signal to the reasoning agent. On mathematical problem-solving with code execution, MSARL achieves more stable reasoning and higher final-answer accuracy than strong single-agent baselines. Our findings indicate that this dual-agent architecture significantly mitigates hallucinations and boosts tool invocation tendencies, thereby improving overall robustness. Our method provides a scalable blueprint for building specialized multi-agent system that can tackle complex reasoning tasks. The code for our method is available at: https://anonymous.4open.science/r/msarl-D50D/.
Primary Area: reinforcement learning
Submission Number: 19399
Loading