Keywords: Tool-use, Tool-Integrated Reasoning, Large Language Model, Self-Refinement
TL;DR: We propose a lightweight Tool-use Refiner, a small, separately trained LLM that acts as a plug-and-play module to fix tool-use errors from large LLMs, effectively boosting their task completion rates without directly training on them.
Abstract: Large Language Models (LLMs) have demonstrated remarkable capabilities in Tool-Integrated Reasoning (TIR). However, the practical application is often hindered by frequent errors in tool invocation, such as incorrect parameters or malformed formats. Prevailing training paradigms like Supervised Fine-Tuning (SFT) and Reinforcement Learning (RL) can mitigate these issues but demand substantial computational resources. To address the limitation, we propose a novel, resource-efficient refinement framework that enhances the tool-use capabilities of large-scale LLMs without directly training on them. We introduce a small-scale model, termed the Tool-use Refiner, which operates as a post-processing module. This Refiner takes the initial tool-integrated reasoning from an upstream LLM and the user's task as input, then performs its own reasoning to correct and enhance the invocation. The Refiner is trained using an advanced RL algorithm, Decoupled Clip and Dynamic Sampling Policy Optimization (DAPO), to ensure efficient and stable policy learning. On a diverse set of tool-use and reasoning benchmarks, our Refiner improves task completion rates and invocation accuracy over the raw outputs of various upstream LLMs. This highlights our Refiner as a lightweight, plug-and-play solution for improving the operational reliability of LLM-based agents. We release our code and model to facilitate future research.
Primary Area: foundation or frontier models, including LLMs
Submission Number: 22806
Loading