MedCalc-R1: Knowledge-Guided Reward Framework for Medical Mathematical Reasoning

ICLR 2026 Conference Submission18918 Authors

19 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: medical mathematical reasoning, knowledge-guided reward, complex reasoning, large language model
Abstract: Medical mathematical reasoning is a critical component of clinical decision-making, where accuracy directly affects patient safety and treatment outcomes. However, existing large model approaches, while improving complex reasoning ability, often suffer from knowledge degradation, computational bias, and lack of interpretability. Moreover, commonly used reward mechanisms rely heavily on coarse-grained acceptable ranges, which fail to guarantee stable and precise mathematical outputs. To address these challenges, we propose a knowledge-guided reward framework with two complementary mechanisms. First, a knowledge verification reward enforces explicit formula generation and leverages an independent verification model to check both formulas and results, thereby mitigating knowledge forgetting, enhancing interpretability, and improving reasoning transparency. Second, a hybrid soft–hard reward mechanism incorporates clinical safety thresholds as hard constraints and introduces progressive accuracy-based rewards as soft optimization, simultaneously achieving improvements in both safety and precision. Extensive experiments on medical mathematical reasoning tasks demonstrate that our approach significantly outperforms existing methods in terms of reasoning accuracy, knowledge robustness, and model generalization, thereby validating the effectiveness and broad applicability of the proposed framework.
Supplementary Material: zip
Primary Area: applications to computer vision, audio, language, and other modalities
Submission Number: 18918
Loading