CoSineVerifier: Tool-Augmented Answer Verification for Computation-Oriented Scientific Questions

CoSineVerifier: Tool-Augmented Answer Verification for Computation-Oriented Scientific Questions

ACL ARR 2026 January Submission9694 Authors

06 Jan 2026 (modified: 20 Mar 2026)ACL ARR 2026 January SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: LLM, Verification, Tool-integrated reasoning

Abstract: Answer verification methods are widely employed in language model training pipelines spanning data curation, evaluation, and reinforcement learning with verifiable rewards (RLVR). While prior work focuses on developing unified verifiers applicable across multiple reasoning scenarios, significant challenges remain in computation-oriented scientific domains, such as algebraic equivalence checking. In this paper, we introduce CosineVerifier, a tool-augmented verifier that leverages external executors to perform precise computations and symbolic simplifications. CosineVerifier enables robust verification that goes beyond simple semantic matching. To train this accurate tool-augmented verifier, we propose a novel verifier training data augmentation method and a two-stage training framework to increase the correctness of tool-invoked verifications on computation-heavy questions. Extensive experiments across STEM, QA, and long-form reasoning tasks demonstrate CosineVerifier's robust generalization, achieving state-of-the-art performance on VerifyBench-Hard and SCI-Bench. Furthermore, when employed as an RLVR reward model, CosineVerifier consistently outperforms both rubric- and model-based verifiers on AIME'24, AIME'25 and GPQA-D, highlighting its potential to advance LLM reasoning.

Paper Type: Long

Research Area: Natural Language Generation

Research Area Keywords: generation, automatic evaluation, applications, chain-of-thought

Contribution Types: Publicly available software and/or pre-trained models

Languages Studied: English

Submission Number: 9694

Loading