Majority of the Bests: Improving Best-of-N via Bootstrapping

Published: 09 Jul 2025, Last Modified: 16 Jul 2025AI4Math@ICML25 PosterEveryoneRevisionsBibTeXCC BY-NC-SA 4.0
Keywords: Best-of-N, Test-time Scaling, Inference Time Computation
TL;DR: We propose a new inference-time method called Majority of the Bests, a simple and scalable method that outperforms Best-of-N.
Abstract: Inference-time computational methods significantly enhance the reasoning abilities of Large Language Models (LLMs). Among these, Best-of-N has gained attention for its simplicity and scalability. It generates $N$ solutions from the LLM and selects the best one based on the reward model's evaluation. Due to imperfect rewards, even with a large $N$, the probability of selecting the correct answer does not necessarily converge to one. To mitigate this limitation, we propose Majority-of-the-Bests (MoB), a novel and hyperparameter-free selection mechanism that estimates the output distribution of Best-of-N via bootstrapping and selects its mode. Experimental results across five benchmarks, three different base LLMs, and two reward models demonstrate consistent improvements over Best-of-N in 25 out of 30 setups. We further provide theoretical results for the consistency of the bootstrapping.
Submission Number: 144
Loading