BEACON: Bayesian Optimal Stopping for Efficient LLM Sampling

Guangya Wan; Zixin Stephen Xu; Sasa Zorc; Manel Baucells; Mengxuan Hu; Hao Wang; Sheng Li

BEACON: Bayesian Optimal Stopping for Efficient LLM Sampling

Guangya Wan, Zixin Stephen Xu, Sasa Zorc, Manel Baucells, Mengxuan Hu, Hao Wang, Sheng Li

19 Sept 2025 (modified: 12 Feb 2026)ICLR 2026 Conference Desk Rejected SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Bayesian Optimal Stopping, Best of N Sampling, LLMs, Test Time Scaling, Reward Model, Decision Theory, Search Theory

TL;DR: We proposed a principled framework that dynamically optimizes LLM sampling using Optimal Stopping theory with Bayesian Learning, reducing computation while maintaining response quality.

Abstract: Sampling multiple responses is a common technique for improving the quality of LLM outputs, but it comes at the cost of additional computation resources. Determining when to stop generating further samples therefore requires balancing response quality against efficiency. Existing methods typically rely on heuristics rather than theoretically grounded optimization, leading to either inefficient under-exploration or wasted resources through oversampling. We introduce Bayesian Efficient Adaptive Criterion for Optimal N-stopping (BEACON), a principled adaptive sampling framework grounded in the theory of Sequential Search with Bayesian Learning. BEACON makes sampling decisions in a Bayesian-optimal manner by sequentially generating samples from the policy LLM and updating a posterior belief over reward distributions of the response given the query. The framework determines optimal stopping points by balancing reward consistency with computational cost, terminating when the expected marginal utility of further exploration no longer justifies the expense. We establish theoretical optimality guarantees for BEACON and computational complexity analysis showing the computational tractability. Empirical results on diverse reasoning and alignment benchmarks show that BEACON reduces average sampling requirements by up to 80\% compared to baselines while matching or exceeding response quality. Beyond benchmark performance, we extend BEACON's applicability to cost-efficient preference data generation, provide principled guidance for hyperparameter selection, and present extensions to batch sampling—offering actionable insights for practitioners and laying a foundation for future work on adaptive sampling.

Primary Area: probabilistic methods (Bayesian methods, variational inference, sampling, UQ, etc.)

Submission Number: 15627

Loading