RePAIR: A Rule-based Process-Adaptive Reinforcement for Large Language Model Training

ICLR 2026 Conference Submission15384 Authors

19 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Symbolic Reasoning Rule, Verifiable Process Reward, Reinforcement Learning, Large Language Model
TL;DR: We construct adaptive verifiable process rewards for the reinforcement learning of large language models through symbolic reasoning rules which automatically extracted from LLM-generated reasoning trajectories.
Abstract: Although reinforcement learning (RL) has demonstrated promise in enhancing the reasoning capabilities of Large Language Models (LLMs), the difficulty of reward design has prohibited exploiting the full potential of RL. Previous methods mainly fall into two categories: training a reward model based on human preferences, or designing verifiable outcome rewards. However, reward models often suffer from poor interpretability and require extensive annotation for effective training. Verifiable outcome rewards provide sparse signals only, which leads to an ambiguous credit assignment and low training efficiency in RL. These limitations necessitate rewards that provide more efficient, fine-grained supervision. In order to address these, we propose Rule-based Process-AdaptIve Reinforcement (RePAIR) that constructs adaptive verifiable process rewards through symbolic reasoning rules. These rules are automatically derived through the integration of common pattern mining and semantic summarization over the reasoning trajectories of LLMs. For stable training purposes, RePAIR defines a reward informativeness metric that dynamically adjusts the rule's weights based on policy updates. Extensive experiments across three reasoning tasks demonstrate that RePAIR achieves a 6.03% improvement on average and combines well with various advantage functions. Code and data will be available at https://anonymous.4open.science/r/RePAIR-8EFC.
Primary Area: reinforcement learning
Submission Number: 15384
Loading