Dyve: Thinking Fast and Slow for Dynamic Process Verification

Dyve: Thinking Fast and Slow for Dynamic Process Verification

ACL ARR 2025 May Submission3585 Authors

19 May 2025 (modified: 03 Jul 2025)ACL ARR 2025 May SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Abstract: Large Language Models have advanced significantly in complex reasoning, often leveraging external reward model to improve the reliability of their multi-step processes. However, existing process verification methods struggle with reliably assessing incomplete reasoning traces and are limited by the cost of high-quality human annotations or the inherent noise in automatically generated labels. Therefore, we present Dyve, a dynamic process verifier that enhances reasoning error detection in large language models by integrating fast and slow thinking, inspired by Kahneman's Systems Theory. Dyve adaptively applies immediate token-level confirmation (System 1) for straightforward steps and comprehensive analysis (System 2) for complex ones. Unlike traditional verifiers that only evaluate final outputs, Dyve employs a {step-wise consensus-filtered supervision} strategy, leveraging Monte Carlo estimation, LLM-as-a-Judge, and specialized reasoning models to extract high-quality training signals from noisy rollouts. Experimental results on ProcessBench and the MATH dataset confirm that Dyve significantly outperforms existing process-based verifiers and boosts performance in Best-of-N settings while maintaining computational efficiency by strategically allocating verification resources.

Paper Type: Long

Research Area: Generation

Research Area Keywords: Large Language Models (LLMs), Multi-Step Reasoning, Process Verification, Process Reward Modeling, Dual-System Theory (System 1 & System 2), Adaptive Computation Budgeting, Monte Carlo Estimation, Consensus Filtering, Process Supervision

Contribution Types: NLP engineering experiment, Approaches low compute settings-efficiency, Publicly available software and/or pre-trained models, Data resources

Languages Studied: English

Keywords: Large Language Models (LLMs), Multi-Step Reasoning, Process Verification, Process Reward Modeling, Dual-System Theory (System 1 & System 2), Adaptive Computation Budgeting, Monte Carlo Estimation, Consensus Filtering, Process Supervision

Submission Number: 3585

Loading