TriMER: Balancing Efficiency and Accuracy in Mathematical Reasoning through a Three-Stage LLM Pipeline

ACL ARR 2025 May Submission6198 Authors

20 May 2025 (modified: 03 Jul 2025)ACL ARR 2025 May SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Abstract: Despite remarkable advances in Large Language Models (LLMs), mathematical reasoning remains a critical frontier where models struggle with accuracy, reliability, and computational efficiency, particularly for competition-level problems. Current approaches face fundamental limitations: distillation methods alone fail to capture reasoning depth, reinforcement learning techniques demand prohibitive computational resources, and ensemble methods multiply inference costs. We introduce a novel three-stage framework, TriMER (Triple-stage Mathematical Efficient Reasoning), that synergistically combines reasoning capability distillation, Group Relative Policy Optimization (GRPO) with zero KL penalty, and multi-agent Preference Reward Model (PRM) reranking to address both reasoning quality and computational efficiency. Leveraging our curated dataset of 387K high-difficulty mathematical problems, we achieve state-of-the-art performance of 76.7% accuracy on the challenging AIME24 benchmark, surpassing Qwen-R1-Distilled-32B (73.3%) and Qwen2.5-Math-72B (30.0%) while using only 5048 tokens per problem—a 6.3× reduction in computational requirements. Our multi-agent framework further improves performance to 79.9% accuracy, demonstrating robustness through solution diversity. Extensive ablation studies confirm the essential contribution of each component, with significant gains from our memory-optimized GRPO implementation. Code and data are available at https://anonymous.github.io/trimer/. By effectively resolving the efficiency-accuracy trade-off that has hindered practical deployment of mathematical reasoning systems, our approach establishes a new paradigm for developing LLMs that can tackle complex mathematical challenges while remaining computationally accessible for real-world applications.
Paper Type: Long
Research Area: Efficient/Low-Resource Methods for NLP
Research Area Keywords: efficiency, computational efficiency, token efficiency, resource optimization, mathematical reasoning, large language models, reinforcement learning
Contribution Types: Model analysis & interpretability, NLP engineering experiment, Approaches to low-resource settings, Data resources
Languages Studied: English
Submission Number: 6198
Loading