MUA-RL: Multi-turn User-interacting Agent Reinforcement Learning for agentic tool use

17 Sept 2025 (modified: 25 Jan 2026)ICLR 2026 Conference Withdrawn SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Large Language Models, Agentic Tool Use, Reinforcement Learing, Multi-turn User-interacting
Abstract: Recent advances in Agentic Intelligence have highlighted the importance of agentic tool use in Large Language Models (LLMs), especially when interacting with users. During multi-turn interactions, the dynamic, uncertain, and stochastic nature of user demands challenges agents to iteratively refine their understanding of user needs through communication while invoking tools to resolve queries, rather than simply calling tools for results. Existing reinforcement learning (RL) approaches for tool use lack the integration of genuinely dynamic users during the RL training process. To bridge this gap, we introduce MUA-RL (Multi-turn User-interacting Agent Reinforcement Learning for agentic tool use), a novel reinforcement learning framework that, for the first time in the field of agentic tool use, integrates LLM-simulated users into the reinforcement learning loop. MUA-RL aims to enable autonomous learning of models to communicate with users efficiently and use various tools to solve practical problems in dynamic multi-turn interactions. Evaluations on several benchmarks demonstrate that MUA-RL-32B outperforms or matches much larger open-source models such as DeepSeek-V3-0324 and Qwen3-235B-A22B in non-thinking setting (see Figure 1).
Supplementary Material: zip
Primary Area: foundation or frontier models, including LLMs
Submission Number: 8662
Loading