CARE: Curriculum-Aware Rubric Evolution for Open-Ended Text Generation

ACL ARR 2026 January Submission7182 Authors

06 Jan 2026 (modified: 20 Mar 2026)ACL ARR 2026 January SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Reinforcement Learning, Reward Modeling, Curriculum Learning, Open-Ended Text Generation
Abstract: Reinforcement Learning with Verifiable Rewards (RLVR) has demonstrated significant potential in tasks characterized by well-defined objective criteria. However, extending this paradigm to open-ended generation tasks, which lack ground truth and are inherently subjective, poses fundamental challenges. Recent research has shifted towards Reinforcement Learning from AI Feedback (RLAIF). Nevertheless, existing methods largely rely on human-crafted or predetermined rubrics maintained via static or rule-based update mechanisms, failing to adapt timely to the evolving capabilities of the policy model, thereby resulting in lagging and sparse critical reward signals. To overcome these limitations, we propose Curriculum-Aware Rubric Evolution (CARE), a rubric evolution method that integrates curriculum awareness with diagnosis-driven evolution. CARE adaptively adjusts the complexity of rubrics across different training stages by monitoring the statistical properties of the reward distribution during policy training. Simultaneously, it introduces a discrepancy-based diagnostic sampling strategy, prioritizing high signal-to-noise critical samples for targeted rubric evolution. Experiments demonstrate that CARE consistently enhances alignment quality during training and achieves superior generation performance compared to multiple strong baselines. Our work provides an adaptive solution for scalable oversight in highly subjective generation scenarios.
Paper Type: Long
Research Area: Natural Language Generation
Research Area Keywords: applications, fine-tuning, LLM/AI agents, prompting, safety and alignment, robustness
Contribution Types: NLP engineering experiment
Languages Studied: English
Submission Number: 7182
Loading