MCP-R1: Generalized Real-World Task Agent Mastering Dozens of Tools

Tianshuo Peng; Jiakang Yuan; Zhijie Zhong; Yilei Jiang; Bin Wang; Peng Ye; Tao Chen; LEI BAI; Xiangyu Yue; Bo Zhang

MCP-R1: Generalized Real-World Task Agent Mastering Dozens of Tools

Tianshuo Peng, Jiakang Yuan, Zhijie Zhong, Yilei Jiang, Bin Wang, Peng Ye, Tao Chen, LEI BAI, Xiangyu Yue, Bo Zhang

18 Sept 2025 (modified: 30 Jan 2026)ICLR 2026 Conference Withdrawn SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Large Language Model, Agentic, Model Context Protocol

Abstract: Modern agentic models require strong capabilities for orchestrating external tools to interact with complex environments. However, existing tool-integration approaches support only a narrow range of tools and lack a unified calling standard. Consequently, they devote little attention to real-world tasks and struggle to transfer to unseen tools. The emergence of the Model Context protocol (MCP) presents an open standard for two-way connections between external tools and agents. To this end, we introduce MCP-R1, a new paradigm designed to enhance models’ universal tool-interaction capabilities. We first construct a virtual-real integrated MCP tool system, supporting 17 MCP servers with 60+ tools, each sourced from real-world services to ensure diversity and authenticity during training. Based on the tool system, we further propose a scalable pipeline for generating multi-tool invocation data. In addition, going beyond rule-based rewards commonly used in QA tasks, we introduce a trajectory-based reward mechanism to evaluate the agent’s performance in goal-driven tasks. Thanks to the unified tool-interaction standard and our training pipeline, MCP-R1 has generic interacting ability across a broad set of tools, demonstrates strong performance on practical tasks across diverse scenarios, while flexibly adapting to unseen tools. Our experiments span several challenging domains including search (GAIA, WebWalker), general tool calling (MCP-Universe), and practical task execution. The strong performance of MCP-R1 underscores the effectiveness of our training paradigm, offering valuable insights and a scalable approach for developing general agentic models.

Primary Area: foundation or frontier models, including LLMs

Submission Number: 10287

Loading