Tool-Integrated Reasoning via Hierarchical Multi-Agent Reinforcement Learning

ACL ARR 2026 January Submission4979 Authors

05 Jan 2026 (modified: 20 Mar 2026)ACL ARR 2026 January SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Tool-Integrated Reasoning, Reinforcement Learning, Multi-agent System
Abstract: Recent advancements in Tool-Integrated Reasoning (TIR) empower Large Language Models (LLMs) with external utilities to overcome intrinsic deficits in knowledge currency and numerical computation. However, existing methods often face a dilemma: single-agent models lack structural interpretability due to entangled planning and execution, while multi-agent systems suffer from unstable tool usage caused by misaligned optimization between high-level planners and low-level executors. To address these challenges, we propose a **T**ree-structured **M**ulti-**A**gent **T**ool **R**easoning framework optimized via **H**ierarchical **R**einforcement **L**earning (TMATR-HRL). Specifically, we introduce the TMATR system, which structures reasoning into a hierarchical tree of atomic decisions, explicitly decoupling strategic planning from step-wise execution to enhance interpretability. To ensure stability and coordination, we employ a HRL scheme that enables the co-evolution of planning and execution policies through alternating on-policy updates. Experiments on mathematical reasoning and question-answering benchmarks show that TMATR-HRL consistently achieves performance improvements across multiple models, while exhibiting significant advantages in the controllability and interpretability of tool usage.
Paper Type: Long
Research Area: AI/LLM Agents
Research Area Keywords: LLM agents, tool use, reinforcement learning in agents, multi-agent systems
Contribution Types: NLP engineering experiment
Languages Studied: English
Submission Number: 4979
Loading