MUA-RL: Multi-turn User-interacting Agent Reinforcement Learning for agentic tool use

Weikang Zhao; Xili Wang; Chengdi Ma; Lingbin Kong; Zhaohua Yang; Mingxiang Tuo; Shi Xiaowei; Yitao Zhai; Xunliang Cai

MUA-RL: Multi-turn User-interacting Agent Reinforcement Learning for agentic tool use

Weikang Zhao, Xili Wang, Chengdi Ma, Lingbin Kong, Zhaohua Yang, Mingxiang Tuo, Shi Xiaowei, Yitao Zhai, Xunliang Cai

17 Sept 2025 (modified: 25 Jan 2026)ICLR 2026 Conference Withdrawn SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Large Language Models, Agentic Tool Use, Reinforcement Learing, Multi-turn User-interacting

Abstract: Recent advances in Agentic Intelligence have highlighted the importance of agentic tool use in Large Language Models (LLMs), especially when interacting with users. During multi-turn interactions, the dynamic, uncertain, and stochastic nature of user demands challenges agents to iteratively refine their understanding of user needs through communication while invoking tools to resolve queries, rather than simply calling tools for results. Existing reinforcement learning (RL) approaches for tool use lack the integration of genuinely dynamic users during the RL training process. To bridge this gap, we introduce MUA-RL (Multi-turn User-interacting Agent Reinforcement Learning for agentic tool use), a novel reinforcement learning framework that, for the first time in the field of agentic tool use, integrates LLM-simulated users into the reinforcement learning loop. MUA-RL aims to enable autonomous learning of models to communicate with users efficiently and use various tools to solve practical problems in dynamic multi-turn interactions. Evaluations on several benchmarks demonstrate that MUA-RL-32B outperforms or matches much larger open-source models such as DeepSeek-V3-0324 and Qwen3-235B-A22B in non-thinking setting (see Figure 1).

Supplementary Material: zip

Primary Area: foundation or frontier models, including LLMs

Submission Number: 8662

Loading