Improving LLM Reasoning through Scaling Inference Computation with Collaborative Verification

ACL ARR 2024 December Submission1479 Authors

16 Dec 2024 (modified: 05 Feb 2025)ACL ARR 2024 December SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Abstract: Despite significant advancements in the general capability of large language models (LLMs), they continue to struggle with consistent and accurate reasoning. One key limitation is that LLMs are trained primarily on correct solutions, reducing their ability to detect and learn from errors, which hampers their ability to reliably verify and rank outputs. To address this, we adopt a widely used method to scale up the inference-time computation by generating multiple reasoning paths and employing verifiers to assess and rank the generated outputs by correctness. To get a better understanding of different verifier training methods, we introduce a comprehensive dataset consisting of correct and incorrect solutions for math and code tasks, generated by multiple LLMs. This diverse set of solutions enables verifiers to more effectively distinguish and rank correct answers from erroneous outputs. Moreover, to leverage the unique strengths of different reasoning strategies, we propose a novel collaborative method integrating Chain-of-Thought (CoT) and Program-of-Thought (PoT) solutions for verification. Our verifiers Math-Rev demonstrates substantial performance gains to existing LLMs, achieving state-of-the-art results on benchmarks such as GSM8k and MATH.
Paper Type: Long
Research Area: NLP Applications
Research Area Keywords: Math Reasoning, Verification
Contribution Types: NLP engineering experiment
Languages Studied: English
Submission Number: 1479
Loading