Abstract: Large language models (LLMs) show great promise in solving complex tasks through external tool use. However, existing approaches largely focus on standard tool formats and instruction-following, neglecting the broader problem of generalizable and robust tool interaction. In this paper, we explore three underexamined challenges in LLMs’ tool-use capabilities: adaptation to counterintuitive tool rules, autonomous discovery of tool functionality under incomplete specifications, and the impact of historical memory on tool-use efficiency. To address these challenges, we propose Path-Aware Reinforcement Learning (PARL), a novel framework that integrates trajectory-level reward assignment and history-based contextualization. PARL assigns dynamic rewards based on path-level outcomes and relative tool-use efficiency, while maintaining a fixed-size memory window to guide policy learning. Experiments across diverse non-standard tool-use scenarios demonstrate that PARL consistently outperforms existing methods, achieving relative gains of up to 28.9\% in low-information settings and 19.8\% in counterfactual reasoning. Our work provides both a diagnostic benchmark and an effective reinforcement learning strategy for advancing tool-augmented LLMs.Our code and dataset will be available at XXX
Paper Type: Long
Research Area: Dialogue and Interactive Systems
Research Area Keywords: Large Language Models,function calling
Languages Studied: English
Submission Number: 5566
Loading