Abstract: Large Language Models have advanced significantly in complex reasoning, often leveraging external reward model to improve the reliability of their multi-step processes. However, existing process verification methods struggle with reliably assessing incomplete reasoning traces and are limited by the cost of high-quality human annotations or the inherent noise in automatically generated labels.
Therefore, we present Dyve, a dynamic process verifier that enhances reasoning error detection in large language models by integrating fast and slow thinking, inspired by Kahneman's Systems Theory. Dyve adaptively applies immediate token-level confirmation (System 1) for straightforward steps and comprehensive analysis (System 2) for complex ones. Unlike traditional verifiers that only evaluate final outputs, Dyve employs a {step-wise consensus-filtered supervision} strategy, leveraging Monte Carlo estimation, LLM-as-a-Judge, and specialized reasoning models to extract high-quality training signals from noisy rollouts. Experimental results on ProcessBench and the MATH dataset confirm that Dyve significantly outperforms existing process-based verifiers and boosts performance in Best-of-N settings while maintaining computational efficiency by strategically allocating verification resources.
Paper Type: Long
Research Area: Generation
Research Area Keywords: Large Language Models (LLMs), Multi-Step Reasoning, Process Verification, Process Reward Modeling, Dual-System Theory (System 1 & System 2), Adaptive Computation Budgeting, Monte Carlo Estimation, Consensus Filtering, Process Supervision
Contribution Types: NLP engineering experiment, Approaches low compute settings-efficiency, Publicly available software and/or pre-trained models, Data resources
Languages Studied: English
Keywords: Large Language Models (LLMs), Multi-Step Reasoning, Process Verification, Process Reward Modeling, Dual-System Theory (System 1 & System 2), Adaptive Computation Budgeting, Monte Carlo Estimation, Consensus Filtering, Process Supervision
Submission Number: 3585
Loading