Keywords: large language models, tool usage, decoding method
TL;DR: We propose Tool Decoding, a training-free method based on an analysis of tool-use failures, improving LLM performance via constrained decoding and order consistency. It yields significant gains, with some 7B models surpassing GPT-4 on key tasks.
Abstract: Despite the significant advancements in large language models (LLMs), their tool-use capabilities remain limited. This limitation stems from the fact that existing approaches often merely adapt strategies designed for basic natural language tasks, overlooking the specific challenges inherent in tool usage, such as precise tool selection, strict predefined formats, and accurate parameter assignment.
To bridge this gap, we conduct a fine-grained analysis of the tool usage process, breaking it down into three critical stages: tool awareness, tool selection, and tool call. Our analysis reveals that most failures stem from selection errors, format violations, and parameter mis-assignments.
Building on these insights, we propose \textbf{Tool Decoding}, a novel, training-free approach that directly incorporates tool-specific information into the decoding process. Tool Decoding employs constrained decoding to ensure format correctness and eliminate hallucinations, while leveraging order consistency to improve parameter accuracy through structured sampling and a majority-voting mechanism. This approach effectively addresses many common tool-use errors in a plug-and-play manner, allowing for seamless generalization to new tools as long as they are accompanied by well-structured documentation to guide the decoding process.
Experimental evaluations on benchmarks like API-Bank and BFCL V2 • Live show that Tool Decoding leads to significant improvements across a diverse set of more than 10 models, including both generalist and tool-finetuned models. Almost all models demonstrate performance gains exceeding 70\% on both benchmarks. Among the 7B-level models, five outperform GPT-3.5 on key tasks, with two even surpassing GPT-4.
Primary Area: foundation or frontier models, including LLMs
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2025/AuthorGuide.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Submission Number: 9531
Loading