Harder Is Better: Boosting Mathematical Reasoning via Difficulty-Aware GRPO and Multi-Aspect Question Reformulation

ICLR 2026 Conference Submission3697 Authors

Published: 26 Jan 2026, Last Modified: 26 Jan 2026ICLR 2026EveryoneRevisionsBibTeXCC BY 4.0
Keywords: Mathematical Reasoning, Group Relative Policy Optimization, Question Reformulation
TL;DR: We propose a MathForge framework to improve mathematical reasoning by targeting harder questions from both algorithmic and data perspectives, including Difficulty-Aware Group Policy Optimization (DGPO) and Multi-Aspect Question Reformulation (MQR).
Abstract: Reinforcement Learning with Verifiable Rewards (RLVR) offers a robust mechanism for enhancing the mathematical reasoning capabilities of large models. However, we identify that harder questions lack sufficient attention in existing methods from both algorithmic and data perspectives. Algorithmically, widely used Group Relative Policy Optimization (GRPO) and its variants exhibit a critical limitation: their advantage estimation introduces an implicit imbalance where the magnitude of policy updates is lower for harder questions. From a data-centric viewpoint, existing augmentation approaches primarily rephrase questions to enhance diversity, without systematically increasing their intrinsic difficulty. To address these issues, we propose a two-dual MathForge framework to improve mathematical reasoning by targeting harder questions from both perspectives, which comprises a Difficulty-Aware Group Policy Optimization (DGPO) algorithm and a Multi-Aspect Question Reformulation (MQR) strategy. Specifically, DGPO first rectifies the implicit imbalance in GRPO via difficulty-balanced group advantage estimation and further prioritizes more challenging questions by difficulty-aware question-level weighting. Meanwhile, MQR reformulates questions across multiple aspects to increase their difficulty while maintaining the original gold answer. Overall, MathForge creates a synergistic loop: MQR expands the data frontier, and DGPO efficiently masters the augmented data. Extensive experiments demonstrate that MathForge markedly outperforms existing methods on various mathematical reasoning tasks. The code and augmented data will all be available.
Primary Area: foundation or frontier models, including LLMs
Submission Number: 3697
Loading