Test-time Verification via Optimal Transport: Coverage, ROC, & Sub-optimality

Published: 29 May 2026, Last Modified: 29 May 2026HiLD at ICML 2026 SpotlightEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Test-time verification, optimal transport
Abstract: While test-time scaling with verification has shown promise in improving the performance of large language models (LLMs), role of the verifier and its imperfections remain underexplored. The effect of verification manifests through interactions of three quantities: (i) the generator’s {\em coverage}, (ii) the verifier’s {\em region of convergence} (ROC), and (iii) the sampling algorithm’s {\em sub-optimality}. Though recent studies capture subsets of these factors, a unified framework quantifying the geometry of their interplay is missing. We frame verifiable test-time scaling as a transport problem. This characterizes the interaction of coverage, ROC, and sub-optimality, and uncovers that the sub-optimality–coverage curve exhibits three regimes. A {\em transport regime} -- where sub-optimality increases with coverage, a {\em policy improvement regime} -- where sub-optimality may decrease with coverage, depending on the verifier’s ROC, and a {\em saturation regime} -- where sub-optimality plateaus, unaffected by coverage. We further propose and analyze two classes of sampling algorithms -- {\em sequential} and {\em batched}, and examine how their computational complexities shape these trade-offs. Empirical results with \texttt{Qwen}, \texttt{Llama}, and \texttt{Gemma} models corroborate our theoretical findings.
Email Sharing: We authorize the sharing of all author emails with Program Chairs.
Data Release: We authorize the release of our submission and author names to the public in the event of acceptance.
Submission Number: 179
Loading