Token Bayesian Optimization: Reasoning LLMs Think Better with the Right Length

ICLR 2026 Conference Submission17577 Authors

19 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: bayesian optimization, reasoning llms
Abstract: Reasoning-based Large Language Models (LLMs) exhibit strong capabilities in complex tasks such as mathematics, programming, and logic, with performance highly dependent on the length of the generated reasoning chains. However, the relationship between reasoning length and task performance is not simply linear; instead, it exhibits task-dependent, non-monotonic, and multi-peaked patterns. Short reasoning chains often result in incomplete arguments, while overly long ones may introduce noise or logical inconsistencies. Existing approaches such as reinforcement learning require extensive supervision or heuristic strategies based on fixed token budgets, and they struggle to effectively identify the optimal reasoning length. To address this, we propose Token Bayesian Optimization (TBO), a supervision-free and task-agnostic framework for reasoning length optimization. TBO combines coarse-grained boundary initialization with Bayesian iterative search, leveraging the evaluative power of LLMs to actively explore the token-length space and progressively converge toward the globally optimal reasoning point. Experiments on multiple standard reasoning benchmarks demonstrate that TBO consistently discovers reasoning lengths that better unlock the model’s potential, achieving significant accuracy gains over existing baselines. The code is publicly available at: \textcolor{blue}{https://anonymous.4open.science/r/TBO-BEFD/}.
Supplementary Material: pdf
Primary Area: foundation or frontier models, including LLMs
Submission Number: 17577
Loading