Solve-Detect-Verify: Inference-Time Scaling with Flexible Generative Verifier

18 Sept 2025 (modified: 06 Jan 2026)ICLR 2026 Conference Withdrawn SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Large Language Models, LLM Reasoning, Inference Time Scaling, Test Time Scaling, Token Efficiency, Reinforcement Finetuning, Verification, Thinking Fast and Slow
Abstract: Complex reasoning with Large Language Models (LLMs) demands a careful balance between accuracy and computational cost. Verification, crucial for reliability, exacerbates this challenge. Existing methods often force a stark trade-off: robust process-based verifiers incur prohibitive costs due to iterative recomputation, while fast, efficient verifiers suffer from low precision. We introduce flexive, a unified generative verifier designed to navigate this trade-off. FlexiVe dynamically allocates compute between rapid "fast thinking" and deliberative "slow thinking." A key innovation is our training strategy: we use Reinforcement Learning (GRPO) to specifically enhance the reliability of the fast mode. Remarkably, this targeted training generalizes, elevating the slow mode to state-of-the-art performance. To optimally deploy flexive, we propose the solve-detect-verify (SDV) pipeline. SDV moves beyond static Best-of-N ranking, employing an efficient iterative refinement process that detects solution completion to curtail "overthinking" and uses flexive’s feedback for targeted correction. Our results demonstrate significant improvements in both accuracy and efficiency. flexive establishes a new open-source state-of-the-art on ProcessBench, outperforming the much larger GenPRM-32B while requiring ~2.3x fewer TFLOPS with 15x less training data. On the challenging AIME 2024 benchmark, the full SDV pipeline achieves 83.3% accuracy, surpassing strong baselines.
Primary Area: foundation or frontier models, including LLMs
Submission Number: 11911
Loading