Keywords: LLM, Tool use, Inference Scaling, Cycle Consistency, Self-improve
Abstract: The scaling of inference-time computation in large language models (LLMs) has emerged as a promising approach for enhancing reasoning capabilities by trading off inference-time and pre-training compute.
The practice of how to enable LLMs to utilize additional computation at test time to improve response accuracy is crucial for both academia and industry.
\textit{Proposer-Verifier}, as a typical paradigm of inference scaling, often fails to generalize to various scenarios.
Specifically, in tool use tasks, LLMs face the risk of lacking effective verifiers, leading to error accumulation in multiple reasoning steps.
In this work, we address these challenges by introducing \textbf{InfCycle}, a multi-stage data synthesis strategy that employs LLMs as data synthesis and employs cycle consistency verification to ensure high-quality trajectory generation.
This approach utilizes step-wise cycle consistency among synthesized trajectories for a given tool, providing effective process supervision that has advantages over outcome supervision.
Extensive experiments on multiple tool-use and reasoning tasks demonstrate that InfCycle efficiently enables self-improvement.
It outperforms state-of-the-art baselines on StableToolBench, achieving a 75.4\% pass rate and a 79.6\% win rate using small size models (7B), without relying on external supervision or expert trajectories for warm-up.
Supplementary Material: zip
Primary Area: generative models
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2025/AuthorGuide.
Reciprocal Reviewing: I understand the reciprocal reviewing requirement as described on https://iclr.cc/Conferences/2025/CallForPapers. If none of the authors are registered as a reviewer, it may result in a desk rejection at the discretion of the program chairs. To request an exception, please complete this form at https://forms.gle/Huojr6VjkFxiQsUp6.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Submission Number: 9930
Loading